Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Querying Append-Only Tables

Before You Get Started:
Take note of the following:

  • This doc isn't specific to BigQuery even though it seems that way. While the focus is on BigQuery, the strategy we outline here can be used on any table that uses append-only Incremental Replication.
  • Our BigQuery destination is currently in open beta. The infoin this article is subject to change..

For tables using Incremental Replication, Stitch currently loads data into Google BigQuery in an append-only fashion. This means that as time goes on, tables will wind up containing many different versions of the same row.

Data stored this way can provide insights and historical details about how those rows have changed over time - creating a timeline of the status changes of an order record, for example - but in some cases, you might just want the latest version of the table. 

In this doc, we'll cover:

Grabbing the Latest Version of Every Row

In each Stitch-generated integration table, you'll see a few columns prepended with _sdc. The column we'll focus on here is the _sdc_sequence column. This column is a Unix epoch (down to the milisecond) attached to the record during replication and can help determine the order of all the versions of a row.

Stitch uses these sequence values in a few places to correctly order rows for loading, but it can be also used to grab the latest version of a record in an append-only table.

Let's take a look at an example. Assume we have an orders table that contains:

  • A Primary Key of id,
  • The system _sdc columns added by Stitch, and
  • A whole lot of other order attribute columns.

If you wanted to create a snapshot of the latest version of this table, you could run a query like this:

SELECT * FROM [stitch-analytics-bigquery-123:ecommerce.orders] o
INNER JOIN (
    SELECT
        MAX(_sdc_sequence) AS seq,
        id
    FROM [stitch-analytics-bigquery-123:ecommerce.orders]
    GROUP BY id ) oo
ON o.id = oo.id
AND o._sdc_sequence = oo.seq

This approach uses a subquery to get a single list of every row's Primary Key and maximum sequence number. It then joins the original table to both the Primary Key and maximum sequence, which makes all other column values available for querying.

Creating Views

To make this easier, you can turn queries like the one in this doc into a view. We recommend this approach because a view will encapsulate all the logic and simplify the process of querying against the latest version of your data.

In the Related section below, you'll find links to how to create views in each of the data warehouses supported by Stitch.

Related

Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!