Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

BigQuery: Expected Data Loading Behavior

Important!
Please note that our BigQuery integration is currently in open beta. The information in this article is subject to change.

Because data can come from a variety of integrations and all those integrations may structure or handle data differently, Stitch will likely encounter numerous scenarios when replicating and loading your data. In this doc, you'll find a number of these scenarios broken out by type:

New Table Scenarios

Summary Scenario Expected Behavior
Some empty columns First batch of data arrives with some columns that are completely empty. Only columns that are populated in at least one record are created
Table name is too long Data arrives with a table name that exceeds the maximum length for BigQuery. Table name limit is 1024 characters. BigQuery will reject all data for that table
Column name is too long Data arrives with a column name that exceeds the maximum length for BigQuery. Column name limit is 128 characters. BigQuery will reject all data for that table
Too many columns Data arrives with more columns than the data warehouse allows. Column limit is 10,000. BigQuery will reject all data for that table
Duplicate fields Data arrives with two column names that canonicalize to the same name BigQuery rejects the entire record containing the two columns. Note that BigQuery IS NOT case-sensitive, but Stitch lowercases column names.
Negative/positive infinity, NaN Special cases of FLOAT/DOUBLE values These values are converted into NULL.
Mixed case column name Data arrives with columns named LiKeThIs All column names are lowercased.
Column name with spaces Data arrives with column names containing spaces Spaces are converted to underscores (_).
Unsupported special characters in column name Data arrives with column names containing special characters that are unsupported by BigQuery Special characters are converted to underscores (_).
Column name starts with digits Data arrives with column names that start with numeric characters. For example: 123Columnname An underscore is prepended to the column name, e.g. 1st order's shipping-address becomes _1st_order_s_shipping_address.

Data Loading Scenarios

Summary Scenario Expected Behavior
Data for existing table version Data arrives that updates existing records in the data warehouse (matched on Primary Key) Data is appended to the end of the table.
Data with a new table version A full set of data for a table arrives at the data warehouse - the table is set to Full Table Replication Table is truncated and new data is loaded in a single atomic action.
Timestamps out of range Data arrives with a date that is out of range for the data warehouse to handle BigQuery accepts dates as extreme as those represented by:
  • Long/MIN_VALUE (292269055-12-02T16:47:04.192-00:00), AND
  • Long/MAX_LONG (292278994-08-17T07:12:55.807-00:00)
VARCHARs max width Data arrives within a text column that exceeds the BigQuery size maximum. The VARCHAR limit is 2MB/row. Stitch will stop processing data for that table.
Decimal range Data arrives within a decimal column that exceeds the BigQuery size limit BigQuery accepts the data but at a loss of precision.

All non-integer values are stored as DOUBLE.

Integer range Data arrives within an integer column that exceeds the BigQuery size limit BigQuery rejects all the data in the batch.

Data Typing Scenarios

Summary Scenario Expected Behavior
Split data types (same batch) Initial batch of data contains mixed types within a single column. For example: a column contains string and INTEGER data. If combo is (INTEGER, FLOAT), Stitch will create a single DOUBLE column.

For any other combo, BigQuery will reject the batch.
Split Data types (different batch) Initial batch of data contains single datatype columns, but a subsequent batch contains mixed datatypes in a single column. For example: a column contains string and INTEGER data. If existing column is FLOAT and new value is INTEGER, Stitch will allow it.

For any other combo, BigQuery will reject the batch.
VARCHARs Data arrives in a VARCHAR or text format of varying lengths BigQuery stores all VARCHAR/text data as strings.
Dates with timezones Timestamp data arrives with no timestamp information BigQuery has no support for timezones.
Decimals Data arrives in decimal format with an arbitrary amount of precision BigQuery does not offer fixed precision.

The only supported decimal datatype is DOUBLE.

Schema Change Scenarios

Summary Scenario Expected Behavior
New column A new column is added to an existing table New column is appended to the end of the table.
Column removed A column is no longer seen within rows of data coming for an existing table Default NULL values are placed into the column.
Column added to nested table Data arrives with a nested table that contains a field that did not exist in that nested table before.  New field is added to nested table.

Nested Rows

Summary Scenario Expected Behavior
Nested JSON objects Data arrives that is nested with many top-level properties Nesting is maintained within BigQuery.

If BigQuery can't support the structure, it will be rejected.
Nested JSON top-level arrays Data arrives that is nested and amongst the nested data are properties that contain arrays See above.
Nested JSON sub-level arrays Data arrives that is nested and amongst the nested data are properties that contain arrays, which themselves contain properties that are arrays See above.
Nested JSON new columns in arrays The sub-level arrays in nested JSON data have a varying set of columns See above.

Direct Data Warehouse Changes

Summary Scenario Expected Behavior
You add columns You add additional columns to a Stitch-generated table in BigQuery Fine, as long as nullable. 
You revoke data warehouse permissions You remove access to the data warehouse as a whole. Data will fail to load and you will be notified.
Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!