What snapshotting is, and how to use it
hotglue’s transformation layer allows you to save and persist data across sync jobs for every tenant. This feature is called snapshotting. Some use cases include:
Note: this guide assumes you’ve read the writing a basic script of the docs.
As described in the writing a basic script section, one of the standard directories for hotglue transformation scripts is the snapshots
directory. Any data saved to that directory will be persisted across job runs.
Let’s walk through a quick sample using pandas:
The file above defines two functions:
update_snapshot
is designed to take data new data from a job sync, and append it to any existing snapshot. This will ensure every job run has access to the full history of data that has been synced by this tenantget_snapshot
reads the snapshot directory to get the snapshot for the stream
id passed (you can think of a stream
as the name of a table)Using the above, we could do the following in our etl.ipynb
to generate a snapshot of all the Account
data from an integration (like Quickbooks):
That’s all there is to it!
Additionally, you can modify the snapshots on a tenant level via the API. For example, you could save a tenant config.json
in the snapshot to store some metadata such as an API key programmatically.
What snapshotting is, and how to use it
hotglue’s transformation layer allows you to save and persist data across sync jobs for every tenant. This feature is called snapshotting. Some use cases include:
Note: this guide assumes you’ve read the writing a basic script of the docs.
As described in the writing a basic script section, one of the standard directories for hotglue transformation scripts is the snapshots
directory. Any data saved to that directory will be persisted across job runs.
Let’s walk through a quick sample using pandas:
The file above defines two functions:
update_snapshot
is designed to take data new data from a job sync, and append it to any existing snapshot. This will ensure every job run has access to the full history of data that has been synced by this tenantget_snapshot
reads the snapshot directory to get the snapshot for the stream
id passed (you can think of a stream
as the name of a table)Using the above, we could do the following in our etl.ipynb
to generate a snapshot of all the Account
data from an integration (like Quickbooks):
That’s all there is to it!
Additionally, you can modify the snapshots on a tenant level via the API. For example, you could save a tenant config.json
in the snapshot to store some metadata such as an API key programmatically.