Stitch Documentation
has moved!

Please update your bookmarks to https://www.stitchdata.com/docs

If you're not automatically redirected after 5 seconds, click here.

Redshift: Multiple Data Types & Column Splitting

Important!
The info in this doc only applies to Redshift data warehouses.

Stitch requires that there only be one data type per column to properly type and store your data. When a single column contains multiple data types, Stitch may not be able to load the data losslessly - or perfectly intact - into your data warehouse.

To compensate for this, when Stitch detects a column that contains multiple data types, a new column is created for each data type. The addition of new columns will also change the structure of the realized table in your data warehouse.

Let’s say we have an orders table in an integration called rep_sales. During the structure sync, Stitch will create a schema called rep_sales in your data warehouse and inside this schema, a table called orders.

The table is structured as follows, excluding the _sdc columns:

Column           | Data Type
-----------------+--------------+
id [pk]          | BIGINT
rep_name         | VARCHAR(128)
order_amount     | BIGINT
order_confirmed  | BOOLEAN


During the first sync, the following rows are replicated:

id [pk] | rep_name    | order_amount | order_confirmed
--------+-------------+--------------+----------------+
1       | Marty McFly | 12           |
2       | Doc Brown   | 10           | true


During the next sync, the following rows are replicated:

id [pk] | rep_name                         | order_amount | order_confirmed
--------+----------------------------------+--------------+----------------+
3       | MartyMcFlyMessedUpTheTime        | 13.25        | true
        | ContinuumAndEliminatedHisOwn     |              | 
        | ExistenceAndNowThereAren’tAny    |              |  
        | HoverboardsInAnAwesomeHitSequal  |              | 
        | WhichIsBothWittyAndFuturistic    |              |              
4       | Doc Brown                        | 11           | yes


These rows are decidedly different than those replicated during the first sync. Specifically:

  1. The rep_name in row 3 is longer than 128 characters.
  2. order_amount can sometimes be a decimal.
  3. order_confirmed can sometimes be a string.

To accommodate the data, Stitch will:

  1. Automatically widen the rep_name field from VARCHAR(128) to VARCHAR(256).
  2. Split the mixed data types into new columns - one for each data type - and append the column name with the data type.

This would modify the table structure to the following:

Column                   | Data Type
-------------------------+---------------+
id [pk]                  | BIGINT
rep_name                 | VARCHAR(256)
order_amount__bigint     | BIGINT
order_amount__double     | DOUBLE
order_confirmed__boolean | BOOLEAN
order_confirmed__string  | VARCHAR(128)

What happens if data can’t fit into the table?

You’ll experience some missing data on your end. To troubleshoot, we recommend taking a look at some of the more common causes for dropped (or missing) data:

If you’re stumped or need some help pinpointing the issue, please reach out to support.

Was this article helpful?
0 out of 0 found this helpful

Comments

Questions or suggestions? If something in our documentation is unclear, let us know in the comments!