Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Expected Contentful Data

Heads Up!

Now that your Contentful data is connected to Stitch, what's next? In this article, we'll cover:

Contentful Schema

In v1 of Stitch's Contentful integration, Stitch will create a single table - called data - in the webhook integration schema (this will be the name you enter in the Integration Schema field when you set up Contentful) of your data warehouse.

The schema of this table will contain two “types” of columns: columns used by Stitch (prepended with _sdc) and the columns sent by the provider’s webhook API.

Aside from the Stitch columns, the schema of this table will depend entirely on Contentful's webhook API. With the exception of the _sdc fields (and a Primary Key if one isn’t provided), Stitch does not augment Incoming Webhooks data nor does it have any control over the fields sent by the webhook provider.

Replicating Contentful Data

After you’ve successfully connected your Contentful integration, Stitch will continuously replicate your webhook data into your data warehouse. Currently, this version of Stitch’s Contentful integration uses Append-Only Replication.

Append-Only Replication is a type of Incremental Replication where newly replicated data is appended to the end of a table. Existing rows are not updated - any updates will be added to the table as new rows. Data stored this way can provide insights and historical details about how those rows have changed over time.

Querying Contentful Webhook Data

If you simply want the latest version of the object - or objects, if you elected to track more than one during the setup - in the integration’s table (data), you’ll have to adjust your querying strategy to account for the append-only method of replication. This is a little different than querying records that are updated using updated_at Incremental Replication.

To do this, you can use the _sdc_sequence column and the table’s Primary Key. The _sdc_sequence is a Unix epoch (down to the millisecond) attached to the record during replication and can help determine the order of all the versions of a row.

If you wanted to create a snapshot of the latest version of this table, you could run a query like this:

SELECT * FROM [stitch-redshift:stitch-contentful.data] o
INNER JOIN (
    SELECT
        MAX(_sdc_sequence) AS seq,
        [primary_key]
    FROM [stitch-redshift:stitch-contentful.data]
    GROUP BY [primary_key]) oo
ON o.[primary_key] = oo.[primary_key]
AND o._sdc_sequence = oo.seq

This approach uses a subquery to get a single list of every row’s Primary Key and maximum sequence number. It then joins the original table to both the Primary Key and maximum sequence, which makes all other column values available for querying.

Related

Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!