Stitch can use one of two methods to replicate your data - Full Table or Incremental. As Replication Methods directly impact your row usage, it's important that you understand how they work.
Additionally, incorrectly setting up the Replication Method for a database integration table can cause data discrepancies.
In this article, we'll cover:
Full Table Replication means that Stitch will replicate the entire contents of a table on every replication attempt. As this replication type can cause latency and quickly use up your monthly row-limit, it's the most inefficient way to use Stitch. We recommend using Incremental Replication if the table in question contains any timestamped or datetime columns.
Note that at this time, Stitch does not support Full Table Replication for Mongo connections. We recognize this can be inconvenient and are working on a solution.
Incremental Replication means that Stitch will only replicate new or updated data on every replication attempt. As this method will greatly reduce latency and data usage, we highly recommend using it where possible.
A Replication Key is required to use Incremental Replication. Replication Keys are columns that Stitch uses to identify new and updated data for replication.
Incorrectly setting Replication Keys can cause data discrepancies and row count issues, so we strongly recommend checking out the Selecting Replication Keys guide before you define the Replication Methods and Keys for your tables.
When Stitch replicates your data, it will store the last recorded maximum value in the Replication Key column and compare it against the data source - not what's in your data warehouse - to identify new/updated data. Any row with a Replication Key value greater than or equal to the stored value is where Stitch will begin the next replication attempt.
If the Replication Key is set to a timestamped column - such as
modified_at - new and updated records will be detected by comparing the last maximum recorded
modified_at date with the maximum
modified_at date in the source database.
customers table for this example. The last maximum date replicated for this table is
2016-02-01 12:00:00. During the next update, rows that were either created or modified at or after this time will be replicated from the data source to the data warehouse. There may be a small amount of duplication as a result of this approach, but it’s to ensure Stitch doesn’t miss any data.
Google BigQuery Destinations
If you're using Google BigQuery as your data warehouse, any incrementally replicated tables will be replicated using the append-only method. For more info, refer to the BigQuery Data Replication & Table Structural Changes doc.
If the only time a table is updated is with new data - meaning existing data is never updated - append-only replication is being used.
Let's say you want to apply Incremental Replication to a
users table, which is append-only. In your business, users are assigned auto-incrementing IDs when accounts are created. You can set the Replication Key to the
user_id column as user IDs are both unique and auto-incrementing.
If the last replication attempt ended at
user_id 1000, the next attempt would start at
user_id 1000, replicating all rows with
user_id >= 1000.
Replication Methods can only be set database integration tables.
For more info about how tables in SaaS integrations are replicated, refer to the Expected SaaS Data section.
With just a few clicks, you can define the Replication Methods for your database integration tables. Click the GIF for a closer look at the whole process.
You can also change these settings by navigating into the table and clicking the Table Settings button.