At this time, Stitch only supports Incremental Replication for Mongo integrations. This means that only new and updated data will be replicated to your data warehouse.
While Incremental Replication works for Mongo the way it does for every other integration, the requirements, gotchas, and process for setting Replication Keys is a little different.
In this article, we’ll cover:
- Requirements for Mongo Replication Keys
- Our recommendations for Replication Keys
- Replication Key Gotchas
- Changing a Mongo Replication Key
Mongo Replication Key Requirements
Stitch uses a field you define - called a Replication Key - to identify new and updated data for replication. For Mongo connections, Stitch requires that the Replication Key field:
- is indexed and
- exists in the root of the document.
If you want to sync Mongo data and are going to add query parameters - which is what Incremental Replication does - undue stress could be put on your Mongo database. By indexing the fields you want to sync, that stress can be relieved.
Replication Key Recommendations
While the only requirement for a Mongo Replication Key is that the field is indexed, we do have some recommendations.
- Replication Key fields should contain only one data type.
While Mongo allows you to have multiple data types in a single field, we strongly recommend keeping Replication Key fields to just one. This is because of the way Mongo compares and sorts data types and how this can impact replication.
- Date and timestamp fields are great Replication Key candidates.
We’re big fans of using
modifiedAt. This is the best way to ensure that both new records and updates to existing records are captured.
In some cases - for example, if a table is append-only -
createdAt may also be suitable.
- If date or timestamp fields can't be used, Replication Keys should update incrementally.
ObjectId data types can be used as Replication Keys if they update incrementally, which allows Stitch to identify a
MAX value and detect new records for replication.
This is suitable for append-only tables only, meaning that the table is only updated with new data. If existing records are ever modified, a field like
modifiedAt should be used instead.
Replication Key Gotchas
Before selecting a Replication Key for a table, there are a few things you should keep in mind.
- Changing a collection's Replication Key requires a full resync of the collection.
To change the Replication Key for a Mongo collection, Stitch must perform a full re-sync of the collection.
NULL is a defined BSON data type in Mongo.
NULLs can actually compare to other data types and replicate without issue.
- While the
_id field can typically be used as a Replication Key...
... You should verify the field's data type - and that it contains only one data type (see below for details) - before using it. Some data types may not auto-increment, which will lead to issues with detecting new data.
- Stitch doesn’t require single data types for Replication Keys.
But we do strongly recommend it. Here’s an example that demonstrates why:
Because Stitch may be unable to correctly identify new and updated data due to how data types are sorted, it’s best to keep Replication Key fields to a single data type. For guidance on how to determine if a field has multiple data types, check out this troubleshooting doc.
- We sync a table, using a field called
_id as the Replication Key. This field contains both
String data types.
- A historical sync of the table completes.
- Because Mongo considers
ObjectId data types to be greater than
Strings, Stitch will record the
MAX value as the last replicated record containing an
ObjectId data type in the Replication Key field.
- New records are added to the table.
- During the next sync, Stitch uses the last recorded
MAX value - in this case, an
ObjectId - to identify new/updated data. Remember: only records with Replication Key values greater than or equal to this value will be selected for replication.
Strings, all records with
Strings are considered to be less than the last recorded
MAX value. This means Stitch won’t be able to detect these records and replicate them to your data warehouse.
Changing a Mongo Replication Key
Changing a Replication Key will queue a full replication of the data in the collection. This will result in a higher number of rows synced, which will count towards your monthly quota. A full re-sync is required to ensure there aren't any gaps in your data.
- From the Stitch Dashboard page, click into the Mongo integration.
- In the Integration Details page, click the database that contains the collection.
- When the list of collections displays, click the name of the collection.
- Click the Collection Settings button.
- Select the desired column from the Replication Key drop-down.
- Click Update Settings.
- A full-sync warning will display. To continue with changing the Replication Key, click the Change Replication Key button.