Snapshots
What snapshotting is, and how to use it
What is Snapshotting?
hotglue’s transformation layer allows you to save and persist data across sync jobs for every tenant. This feature is called snapshotting. Some use cases include:
- storing some metadata about the tenant necessary for the transformation script to run (mapping, an API key/identifier, etc.)
- persisting a state of the data that this tenant has already synced. For example, you could keep a full copy of all the data the tenant has synced to detect which records are new vs updated.
- allowing different integrations to access information about the tenant. For example, you could correlate data across Salesforce and Quickbooks to generate some unified output.
How to use Snapshots?
Transformation Script
Note: this guide assumes you’ve read the writing a basic script of the docs.
As described in the writing a basic script section, one of the standard directories for hotglue transformation scripts is the snapshots
directory. Any data saved to that directory will be persisted across job runs.
Let’s walk through a quick sample using pandas:
The file above defines two functions:
update_snapshot
is designed to take data new data from a job sync, and append it to any existing snapshot. This will ensure every job run has access to the full history of data that has been synced by this tenantget_snapshot
reads the snapshot directory to get the snapshot for thestream
id passed (you can think of astream
as the name of a table)
Using the above, we could do the following in our etl.ipynb
to generate a snapshot of all the Account
data from an integration (like Quickbooks):
That’s all there is to it!
API
Additionally, you can modify the snapshots on a tenant level via the API. For example, you could save a tenant config.json
in the snapshot to store some metadata such as an API key programmatically.