What is Snapshotting?
hotglue’s transformation layer allows you to save and persist data across sync jobs for every tenant. This feature is called snapshotting. Some use cases include:- storing some metadata about the tenant necessary for the transformation script to run (mapping, an API key/identifier, etc.)
- persisting a state of the data that this tenant has already synced. For example, you could keep a full copy of all the data the tenant has synced to detect which records are new vs updated.
- allowing different integrations to access information about the tenant. For example, you could correlate data across Salesforce and Quickbooks to generate some unified output.
How to use Snapshots?
Transformation Script
Note: this guide assumes you’ve read the writing a basic script of the docs. As described in the writing a basic script section, one of the standard directories for hotglue transformation scripts is thesnapshots directory. Any data saved to that directory will be persisted across job runs.
Let’s walk through a quick sample using pandas:
util.py
update_snapshotis designed to take data new data from a job sync, and append it to any existing snapshot. This will ensure every job run has access to the full history of data that has been synced by this tenantget_snapshotreads the snapshot directory to get the snapshot for thestreamid passed (you can think of astreamas the name of a table)
etl.ipynb to generate a snapshot of all the Account data from an integration (like Quickbooks):
etl.ipynb
API
Additionally, you can modify the snapshots on a tenant level via the API. For example, you could save a tenantconfig.json in the snapshot to store some metadata such as an API key programmatically.