Now that your Contentful data is connected to Stitch, what's next? In this article, we'll cover:
In v1 of Stitch's Contentful integration, Stitch will create a single table - called
data - in the webhook integration schema (this will be the name you enter in the Integration Schema field when you set up Contentful) of your data warehouse.
The schema of this table will contain two “types” of columns: columns used by Stitch (prepended with
_sdc) and the columns sent by the provider’s webhook API.
Aside from the Stitch columns, the schema of this table will depend entirely on Contentful's webhook API. With the exception of the
_sdc fields (and a Primary Key if one isn’t provided), Stitch does not augment Incoming Webhooks data nor does it have any control over the fields sent by the webhook provider.
After you’ve successfully connected your Contentful integration, Stitch will continuously replicate your webhook data into your data warehouse. Currently, this version of Stitch’s Contentful integration uses Append-Only Replication.
Append-Only Replication is a type of Incremental Replication where newly replicated data is appended to the end of a table. Existing rows are not updated - any updates will be added to the table as new rows. Data stored this way can provide insights and historical details about how those rows have changed over time.
If you simply want the latest version of the object - or objects, if you elected to track more than one during the setup - in the integration’s table (
data), you’ll have to adjust your querying strategy to account for the append-only method of replication. This is a little different than querying records that are updated using
updated_at Incremental Replication.
To do this, you can use the
_sdc_sequence column and the table’s Primary Key. The
_sdc_sequence is a Unix epoch (down to the millisecond) attached to the record during replication and can help determine the order of all the versions of a row.
If you wanted to create a snapshot of the latest version of this table, you could run a query like this:
SELECT * FROM [stitch-redshift:stitch-contentful.data] o INNER JOIN ( SELECT MAX(_sdc_sequence) AS seq, [primary_key] FROM [stitch-redshift:stitch-contentful.data] GROUP BY [primary_key]) oo ON o.[primary_key] = oo.[primary_key] AND o._sdc_sequence = oo.seq
This approach uses a subquery to get a single list of every row’s Primary Key and maximum sequence number. It then joins the original table to both the Primary Key and maximum sequence, which makes all other column values available for querying.