Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Google BigQuery Destination Beta Overview

Important!
Please note that our BigQuery destination is currently in open beta. The information in this article is subject to change. 

Google BigQuery is a fully managed, cloud-based big data analytics web service for processing very large read-only data sets. BigQuery was designed for analyzing data on the order of billions of rows, using a SQL-like syntax. 

For more information, check out Google's BigQuery overview.

Beta Guidelines

Currently the Stitch BigQuery destination is in open beta. We encourage you to participate and give us feedback, but please consider the following first:

  1. Because BigQuery is currently in beta, you may encounter bugs or other unexpected effects while using it.
  2. There may be situations where re-replicating your data is necessary to correct issues.

We appreciate your patience and feedback as we work to perfect this destination.

Key Concepts for Working with BigQuery

The Stitch BigQuery destination is inherently different from our Amazon Redshift destination. Before testing out BigQuery, take note of the following so you know what to expect.

Usage Pricing vs. Fixed-Rate Pricing

BigQuery's pricing model is based on usage instead of a fixed-rate, meaning your bill can vary over time. Before fully committing yourself to using BigQuery as your data warehouse, we recommend familiarizing yourself with the BigQuery pricing model and how using Stitch may impact your costs.

Click here for more info on how a BigQuery-Stitch partnership may impact your warehousing costs.

Setting Up & Connecting BigQuery to Stitch

Unlike connecting Stitch to Redshift, setting up BigQuery isn't as simply a matter of having warehouse credentials. In addition to completing the authorization process inside Stitch, we also require a user that:

  • Has access to a Google Cloud Platform project within BigQuery
  • Has Google Cloud Storage privileges

Click here for more info on setting up BigQuery and connecting it to Stitch.

Append-Only Incremental Replication

BigQuery was originally designed as an append-only data store, and the initial release of our BigQuery destination follows a similar paradigm.

This means that updates to existing rows in incrementally replicated tables are appended as new rows to the end of the table, creating a record of how the rows have changed over time. When querying your data, you'll need to account for append-only replication.

Click here for more info on how Stitch replicates data to BigQuery.

Nested Records Replication

When nested data is replicated to Redshift, Stitch will de-nest or break apart records into subtables. This is by design, as Redshift doesn't natively support nested record replication.

Unlike Redshift, BigQuery excels at supporting nested records. This means that Stitch will not de-nest records that are sent to BigQuery.

Click here for more info on nested record support for BigQuery and Stitch.

Data Loading Scenarios

Because data can come from a variety of integrations and all those integrations may structure or handle data differently, Stitch will likely encounter numerous scenarios when replicating and loading your data. It's important to familiarize yourself with how certain scenarios will be handled so you can understand what's happening or how to diagnose an issue.

Click here for more info on what those scenarios are and how Stitch handles them.

 

Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!