Please note that our BigQuery integration is currently in open beta. The information in this article is subject to change.
As replicated data arrives at your BigQuery project, it will be stored in automatically-created tables. When new data is detected and ready to be loaded, Stitch does a few things depending on the replication method being used.
Each table inside the integration schemas in your data warehouse contain a few columns prepended with
Stitch uses these columns to replicate your data.
Note that if any columns in your data source are prepended with
sdc, Stitch will NOT replicate them to your data warehouse. Stitch reserves all columns with this naming convention for internal use.
For SaaS and database tables that are set to incrementally replicate, Stitch replicates data into BigQuery in an append only fashion. Unlike other data warehouses that Stitch supports - such as Amazon Redshift - data that updates existing rows in the table (based on Primary Key information) do NOT overwrite the existing data in the table. Instead, that data is appended to the end of the table.
This means that there can be many different rows in a BigQuery table with the same Primary Key, each representing what the data was at that moment in time. Stitch provides the _sdc_sequence column to help distinguish the age of the records. Let's take a look at an example.
Our example table contains the following rows to start with:
Name | Age | Type | Magic? | _sdc_sequence -----+-----+-------+---------+-------------- Finn | 14 | Human | False | 1473773877524 Jake | 6 | Dog | Tru | 1473773877524
These rows are replicated to BigQuery during the initial sync of the table. Some time later, new rows are added to the table:
Name | Age | Type | Magic? | _sdc_sequence ----------+-----+----------+--------+---------------+ Bubblegum | 16 | Princess | True | 1473776585195 Beamo | 1 | Game Bot | False | 1473776585195
During the next replication job, the new rows are appended to the table, bringing the total row count to 4:
Name | Age | Type | Magic? | _sdc_sequence ----------+-----+----------+--------+---------------+ Finn | 14 | Princess | False | 1473773877524 Jake | 6 | Dog | True | 1473773877524 Bubblegum | 16 | Princess | True | 1473776585195 // new rows are appended Beamo | 1 | Game Bot | False | 1473776585195
But what happens when an existing row is updated? If Finn's age were updated to 15, the existing row would not be updated.
Instead, a new row would be appended to the end of the table (with a higher sequence number), bringing the total row count to 5. :
Name | Age | Type | Magic? | _sdc_sequence ----------+-----+----------+--------+---------------+ Finn | 14 | Human | False | 1473773877524 Jake | 6 | Dog | True | 1473773877524 Bubblegum | 16 | Princess | True | 1473776585195 Beamo | 1 | Game Bot | False | 1473776585195 Finn | 15 | Human | False | 1473827984228 // updated data as new row
The lone exception to the append only replication that is performed by Stitch is for database integrations that have tables set to Full Table Replication. Some SaaS integrations also exhibit this behavior. This method of replication provides a single, point in time copy of the source table.
During the initial replication of a table using Full Table replication, you might see what looks like an “incomplete” table. This is because Stitch is still processing and replicating your data. Once the replication is complete, the table will be marked with a version 1 stamp.
During the next replication attempt, a new version of the table will be created. Each batch is marked with a version stamp as Stitch picks it up. At this point:
When data arrives with additional columns that aren’t reflected in the existing BigQuery table, Stitch automatically appends those column to the end of the table structure.
If a column is removed from the source, data for that column will no longer arrive in BigQuery. Stitch will not remove the column from the table and it will contain
NULL values from that point forward.
You’ll experience some missing data on your end. To troubleshoot, we recommend taking a look at some of the more common causes for dropped (or missing) data:
If you’re stumped or need some help pinpointing the issue, please reach out to support.