When you set a table to use Incremental Replication, you’ll also need to define a Replication Key for that table.
Replication Keys are columns that Stitch uses to identify new and updated data for replication.
As improperly setting Replication Keys can cause data discrepancies, it’s important to understand how they work, what makes a good key, and the gotchas associated with them. In this doc, we’ll cover:
To use Incremental Replication, a table must contain one of the following column types to be used as the Replication Key:
BIGINT, INT and MEDIUMINT
When Stitch replicates your data, it will store the last recorded maximum value in the Replication Key column and compare it against the data source - not what's in your data warehouse - to identify new/updated data. Any row with a Replication Key value greater than or equal to the stored value is where Stitch will begin the next replication attempt.
When it comes to replicating your data, there are a lot of ‘keys’ involved. It can be difficult to keep them all straight, but aside from Replication Keys, there’s one more you should keep in mind: Primary Keys.
In Stitch, Replication Keys and Primary Keys serve two different purposes:
While a column can sometimes be used as both a Replication Key and a Primary Key, these are not necessarily always the same column.
While a column only need be an integer, datetime, or timestamp to be a Replication Key, we have some other recommendations:
modified_at. This is the best way to ensure that both new records and updates to existing records are captured.
created_atdate or timestamp column may also be suitable.
Before selecting a Replication Key for a table, there are a few things you should keep in mind:
NULLvalues in the Replication Key column will only be replicated during the first sync of an integration.
NULL. Stitch uses the Replication Key column to detect new and updated data - without it, data can't be correctly detected and replicated.
Changing an existing Replication Key for a table is simple - just open up the Table Settings page for the table and select the new Replication Key column from the drop-down menu.
When you change a table's Replication Key, Stitch will queue a full re-sync of the table's data. We do this to ensure that there aren't any gaps because of the Replication Key switch.
Because Replication Keys are so important and can directly contribute to data discrepancies and row count issues if improperly set, we felt the next two points merited their own section:
idcolumn, which is an auto-incrementing integer. Existing rows in this table are updated, but the
idcolumn never changes after the record is created. Stitch will not detect the updated values because the
idcolumn hasn’t changed.
BIGINTcolumn that’s used in a boolean fashion, meaning it contains 0s and 1s. Every time the values in this column change, the row will be re-replicated to your data warehouse and count against your monthly limit.
If you encounter a data discrepancy, we recommend you start by verifying that the Replication Method and Key for the table are properly set. For further assistance, check out the Data Discrepancy Troubleshooting Guide.
Additionally, you can also reset Replication Keys for database integrations if needed.