Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Understanding How Stitch Queries Your Database Integrations

Before you connect your database to Stitch for replication, you might want the details on how we’ll replicate your data. What queries will Stitch run? Will the load from Stitch be enough to negatively impact performance? What can you do to insulate your database?

In this doc, we’ll cover what the Stitch replicator does and our recommendations for ensuring high availability and performance.

Understanding the Stitch Replicator

When you connect a database - like MySQL, Postgres, or MongoDB - as an input, Stitch only needs read-only access to the databases, tables, collections, and columns you want to sync.

There are two distinct processes Stitch will run to replicate this data: a structure sync and a data sync.

Structure Sync

The first part of the replication process is called a structure sync. This process will detect any changes to the structure of your database.

For example: you add a new column to one of the tables you’re syncing in Stitch. The structure sync will detect the new column and after the replication job completes, display it in the Stitch app.

The queries Stitch runs to detect these changes can vary depending on the type of database integration you’ve connected.

Database Queries we run
Microsoft SQL Server (MSSQL) Structure syncs query the databases and partitions tables in the sys schema.
MongoDB
  • db.getMongo().getDBNames()
  • db.getCollectionNames()

For every collection in the database - even those that aren’t set to sync - we also run the following queries:

  • db.collection.count()
  • db.collection.getIndexes()
MySQL
  • SHOW TABLES
  • SHOW KEYS FROM [table]
  • SELECT * FROM INFORMATION_SCHEMA.TABLES
Postgres We run queries on the following tables in the pg_catalog schema:

  • pg_class
  • pg_attribute
  • pg_index
  • pg_namespace


Data Sync

The second part of the replication process is called a data sync. This is where Stitch actually pulls data out of your database for replication.

The method Stitch uses is the same for all databases, but differs depending on the Replication Method that each table uses.

Full Table Replication

For tables using Full Table Replication, Stitch runs a single query (shown below) and reads out of the resulting cursor in batches. Note that the Mongo equivalent for this query looks a little different.

select column_a, column_b <,...> from table_a

Incremental Replication

For tables using Incremental Replication, Stitch runs a single query (shown below) and reads out of the associated cursor in batches. Note that the Mongo equivalent for this query looks a little different.

select column_a, column_b <,...> from table_a
where replication_key_column >=’last_bookmark_value’
order by replication_key_column

Recommendations

While we make every effort to ensure the queries that Stitch runs don’t impart significant load on your databases, we still have some recommendations for guaranteeing database performance:

  • Use a replica database instead of connecting directly. We recommend using read replicas in lieu of directly connecting production databases with high availability and performance requirements.
  • Apply indexes to Replication Key columns. We restrict and order our replication queries by this column, so applying an index to the columns you’re using as Replication Keys can improve performance.
Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!