Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Defining Replication Methods

Stitch can use one of two methods to replicate your data - Full Table or Incremental. As Replication Methods directly impact your row usage, it's important that you understand how they work.

Additionally, incorrectly setting up the Replication Method for a database integration table can cause data discrepancies.

In this article, we'll cover:

Full Table Replication

Full Table Replication means that Stitch will replicate the entire contents of a table on every replication attempt. As this replication type can cause latency and quickly use up your monthly row-limit, it's the most inefficient way to use Stitch. We recommend using Incremental Replication if the table in question contains any timestamped or datetime columns.

Note that at this time, Stitch does not support Full Table Replication for Mongo connections. We recognize this can be inconvenient and are working on a solution.

Incremental Replication

Incremental Replication means that Stitch will only replicate new or updated data on every replication attempt. As this method will greatly reduce latency and data usage, we highly recommend using it where possible.

A Replication Key is required to use Incremental Replication. Replication Keys are columns that Stitch uses to identify new and updated data for replication.

Important!
Incorrectly setting Replication Keys can cause data discrepancies and row count issues, so we strongly recommend checking out the Selecting Replication Keys guide before you define the Replication Methods and Keys for your tables.

When Stitch replicates your data, it will store the last recorded maximum value in the Replication Key column and compare it against the data source - not what's in your data warehouse - to identify new/updated data. Any row with a Replication Key value greater than or equal to the stored value is where Stitch will begin the next replication attempt.

Incremental Replication Types

There are two types of Incremental Replication that can be used in Stitch: updated/modified at and append-only.

Updated/Modified At Replication

If the Replication Key is set to a timestamped column - such as modified_at - new and updated records will be detected by comparing the last maximum recorded modified_at date with the maximum modified_at date in the source database.

Consider a customers table for this example. The last maximum date replicated for this table is 2016-02-01 12:00:00. During the next update, rows that were either created or modified at or after this time will be replicated from the data source to the data warehouse. There may be a small amount of duplication as a result of this approach, but it’s to ensure Stitch doesn’t miss any data.

Append-Only

Google BigQuery Destinations

If you're using Google BigQuery as your data warehouse, any incrementally replicated tables will be replicated using the append-only method. For more info, refer to the BigQuery Data Replication & Table Structural Changes doc.

If the only time a table is updated is with new data - meaning existing data is never updated - append-only replication is being used.

Let's say you want to apply Incremental Replication to a users table, which is append-only. In your business, users are assigned auto-incrementing IDs when accounts are created. You can set the Replication Key to the user_id column as user IDs are both unique and auto-incrementing.

If the last replication attempt ended at user_id 1000, the next attempt would start at user_id 1000, replicating all rows with user_id >= 1000.

Defining a Table's Replication Method

Replication Methods can only be set database integration tables.

For more info about how tables in SaaS integrations are replicated, refer to the Expected SaaS Data section.

With just a few clicks, you can define the Replication Methods for your database integration tables. Click the GIF for a closer look at the whole process.

Defining a replication method for a table.

  1. From the Stitch Dashboard page, click into the database integration.
  2. In the list of databases for this integration, click into the database that contains the table you want.
  3. In the list of tables contained in the database, find the table you want. Click the icon next to the Status column to sync the table. The icon will turn green and the status will change to Syncing.
  4. In the Table Settings window, click the icon next to the Replication Method you want the table to use.
  5. If you're using Incremental Replication, you'll also need to select a Replication Key. Click the dropdown menu and select the appropriate column. Remember this needs to be an integer, datetime, or timestamp column.
  6. Click the Update Settings button.

You can also change these settings by navigating into the table and clicking the Table Settings button.

Related:

Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!