Worried about going over your row limit? Don't stress - there are a few things you can do to help keep your usage down:
The default Replication Frequency for the majority of integrations is 30 minutes. If you can manage going without the freshest data, you can dial back the interval to something less frequent - for example, every hour or every 6 hours.
Keep in mind that the Replication Frequency setting applies to the entire integration, not individual tables. This is especially important if there are a lot of tables that use Full Table Replication in the integration.
If a database integration is eating up a lot of your row limit, check the Replication Methods of the tables you've set to sync. Whenever possible, we recommend using Incremental Replication, as this can significantly reduce the amount of redundant data replicated by Stitch.
Note that you cannot set Replication Methods for SaaS integrations at this time. To compensate for this, however, you can set the integration to replicate less often.
This section only applies if you're using Redshift, Panoply, or Postgres as your data warehouse.
To be more specific, is the structure of the source data nested? If you're using Redshift/Panoply or Postgres as your data warehouse, Stitch will de-nest any arrays in the data and break them out into individual subtables for easier querying.
We go into detail about this in the Handling of Nested Data Structures doc (the read is worth the time), but it comes down to this: if the source data is nested, one record doesn't necessarily equate to a single replicated row. There could be multiple rows for one record due to the de-nesting Stitch performs.
Many of the SaaS apps we integrate with use nested arrays to structure data in their APIs; Mongo data can also be nested. Refer to the Expected SaaS Data section to learn more about how SaaS data is structured and replicated.
Before setting the Replication Frequency for your SaaS integrations (or Mongo), we recommend familiarizing yourself with how that integration's data is structured. If there's nested data, there'll be subtables created which will mean a higher overall row count.
We know: this one should be obvious, but it's worth mentioning. While only you can know how much data are in your database integrations, it may be difficult to gauge what or how much is contained in your SaaS integrations. It might feel like you're flying blind.
Getting to know that particular integration's data structure and how Stitch replicates its data is important, but here's a short list of some of the major culprits of high row counts.
|Facebook Ads||Every time Facebook Ads data is replicated, the past 28 days are replicated. This is to account for updates made to insights info during the default 28 day attribution window on campaigns.|
|NetSuite||While you can select individual NetSuite tables to sync, the majority of them use Full Table Replication.|
|JIRA||While you can select individual JIRA tables to sync, the majority of them use Full Table Replication.|
|MailChimp||Every MailChimp table uses Full Table Replication and also contains nested structures. In addition, due to how MailChimp creates email activity records, large numbers of rows can be created in the source.|
|Mixpanel||While you can select individual Mixpanel tables to sync, Mixpanel in general generates massive amounts of data.|
|Shopify||Shopify heavily nests data, meaning there are many, many subtables that have to be created by Stitch.
Additionally, this means that if you choose a higher Replication Frequency - every 30 minutes, for example - data from earlier in the day will be re-replicated and count towards your row quota.
|Zendesk||Zendesk heavily nests data, meaning there are many, many subtables that have to be created by Stitch. Additionally: the bigger your account, the more data there'll be to replicate.|
To keep your row count down (and keep your data warehouse tidy), you can also unsync any tables or columns you don't need.
Note that this is only applicable to database integrations and the SaaS integrations that support greylisting, or the syncing of individual tables.
If all else fails, you can temporarily pause the integration to refrain from going over your row limit.