Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Data Discrepancy Troubleshooting Guide

Do you see a discrepancy between your source integration and your data warehouse? This article will walk you through some common data discrepancy “gotchas” that can help you pinpoint the root of the issue.

If you still have questions, provide the information outlined below to our support team. This will help us validate your request and resolve the discrepancy more quickly.

Data Discrepancy Gotchas

Before you reach out to support about a data discrepancy, please check the following:

Historical Syncs & Data Volume

While Stitch is designed to quickly and efficiently process large amounts of data, it can take some time to replicate and load your data into your data warehouse. What looks like missing data may actually be incomplete processing, meaning Stitch hasn’t finished loading all the data into your data warehouse.

This is especially true for SaaS historical syncs, which may also be affected by API quotas. Most data discrepancies can be solved by simply waiting and giving Stitch time to process and load the data.

Query Timeframe Comparison

Are you querying for the same timeframe in your data warehouse and in your data source? Keep in mind that historical syncs for SaaS integrations have varying default start dates.

For the majority of SaaS integrations, a historical sync goes back one year from the Stitch connection date. We recommend checking out the Syncing Historical SaaS Integration Data article to see if the default starting date corresponds with the discrepancy.

Timezone Differences

Have you accounted for any timezone variation between the data source and your data warehouse? If your data source is configured to report in a certain timezone, those timestamps will be converted to UTC in Redshift.

Replication Keys

Check the Replication Method settings for the trouble table, accessed by clicking into the table in the Integration Details page, then the Table Settings button.

If your database table is set to replicate incrementally, ensure that the proper Replication Key is being used. Remember:

  • Replication Key settings are only relevant for database integrations.
  • Mongo Replication Keys have a different set of gotchas than Replication Keys for other integrations.
  • If values in your records are updated over time, you should use a modification timestamp for replication.
  • Stitch does not capture hard deletes if the table is using Incremental Replication.
  • Replication Key columns with NULL values are only replicated during the first replication for that integration.

Replication Frequency

Check the Replication Frequency for the integration, accessed by clicking the Integration Settings button in the Integration Details page. If the missing records were created very recently, you may need to wait for an update of your data to complete before they appear in your data warehouse.

Third-Party Downtime

Check to see if the SaaS integration where you noticed the discrepancy is experiencing downtime. Here's a list of all our integrations' status pages and you can always see Stitch’s status on our status page.

Discrepancy Consistencies

Take a look at the data and identify whether there are any consistencies around the discrepancy, such as records missing over a specific timeframe or field value discrepancies affecting only certain types of records. Could these correspond with any recent changes in your source integration?

Your Query Tool

Make sure you’re using a SQL client to directly query your data warehouse. This will ensure that the discrepancy isn’t the result of report refresh lags, third party bugs/downtime, or any other type of data delay.

Information to Provide to Support

If the discrepancy can’t be explained by any of the points above, please reach out to support and provide the following information. Note that there are different sections for row count versus field value discrepancies.

For Row Count Discrepancies:

  1. Provide 3-5 examples of records that exist in the source but not in your data warehouse. You can send us the entire record, but at the very least we need:
    • The Primary Key
    • The Replication Key
    • The field with the discrepancy
  2. In your source database, run the queries outlined below for: [source_integration_schema].[table_name] and provide us with the results:
    • MIN(DATE)
    • MAX(DATE)
    • MIN(REPLICATION_KEY)
    • MAX(REPLICATION_KEY)
    • COUNT(*)
  3. In your data warehouse, run the queries outlined below for: [data_warehouse_schema].[table_name] and provide us with the results:
    • MIN(DATE)
    • MAX(DATE)
    • MIN(REPLICATION_KEY)
    • MAX(REPLICATION_KEY)
    • COUNT(*)
  4. For SaaS integrations, whenever possible, please provide us with:
    • Raw exports showing row-level data that illustrates the discrepancy
    • Screenshots from the source integration’s UI that illustrate discrepancy
    • Exact API calls and full responses that illustrate discrepancy (make sure you exclude your API key)

For Field Value Discrepancies:

  1. 3-5 examples of discrepancies between the source integration and your data warehouse. Please include:
    1. id
    2. field with the discrepancy
    3. updated_at value (when applicable)
  2. For database integrations:
    1. Confirmation that the Replication Method is appropriately capturing changed values
    2. Confirmation that the Replication Key is being appropriately populated and does not contain NULL values
  3. For SaaS integrations, whenever possible, please provide us with:
    1. Raw exports showing row-level data that illustrates the discrepancy
    2. Screenshots from the UI that illustrate discrepancy
    3. Exact API calls and full responses that illustrate discrepancy (make sure you exclude your API key)
Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!