This guide walks through writing a basic transform script using the gluestick package
Environment Variable | Description |
---|---|
TENANT | ID of the tenant (test-user ) |
FLOW | ID of the flow (FZev7QqK- ) |
ENV_ID | ID of the hotglue environment (prod.hg.example.com ) |
JOB_ID | ID of the job (ZVonkl ) |
API_KEY | Environment API key (XXXX ) |
API_URL | URL of the hotglue API (https://client-api.hotglue.xyz ) |
JOB_ROOT | A unique key marking the root directory for the job (cust_1/flows/FZev7QqK-/jobs/2024/07/4/25/17/ZVonkl ) |
SYNC_TYPE | Type of sync (incremental_sync , auto_sync , or full_sync ) |
JOB_TYPE | Type of the job, in V2 flows (write or read ) |
CONNECTOR_ID | ID of the connector, in V2 flows (salesforce ) |
TAP | The connector managing the sync process, in both V1 and V2 flows (api ) |
TARGET | The connector managing the export, in both V1 and V2 flows (salesforce ) |
os.environ
:
Directory Path | Description |
---|---|
sync-output | This directory contains all input data from the data source (typically Parquet or CSV). |
snapshots | This directory can be used to store (snapshot) any data for the current tenant. It also stores the tenant-config.json used for tenant-specific settings. (typically JSON, Parquet, or CSV). |
etl-output | This directory is where you should put all output data, formatted as needed for the target your flow is using (typically a data.singer file). |
config.json
and state.json
, this file resides in the snapshots
folder and is shared across any number of linked integrations.
config.json
and state.json
are stored in the root_dir
. Access them as follows:
tenant-config.json
, on the other hand, is accessed from the snapshots directory. If you need to read or modify this object, your code will look something like below:
snapshots
are optional, and may not be needed for your scripts.
sync_output
folder contains a CSV file called campaigns
. Learn more how to get sample data in the Debugging a script section.
INPUT_DIR
with gluestick’s reader function.
campaigns
data as follows:
campaigns.head()
we can preview the data
id
, emails_sent
, create_time
, and status
, and then rename them.
OUTPUT_DIR
using pandas to_csv
function:
etl-output
directory:
to_singer
function in lieu of to_csv
: