etl commands below.
ETL Download
Description
Clones the remote ETL script saved in hotglue to your local machine.Sample
Parameters
| Option | Default | Description |
|---|---|---|
--overwrite | ||
-o | false | When enabled, overwrites any files that already exist locally in the download to directory. |
--downloadTo | ||
-d | . | The directory to download the ETL to. Defaults to the local directory. |
ETL Deploy
Description
Deploys the local ETL script to hotglue.Sample
Parameters
| Option | Default | Description |
|---|---|---|
--sourceFolder | ||
-s | . | The directory to upload the ETL script from. Defaults to the local directory. |
ETL Delete
Description
Deletes a deployed ETL script on hotglue.Sample
ETL Set up Local Job Data
Description
Clones hotglue job data to your local machine and creates a.env file with the job’s environment variables.
The file structure and content is identical to the file system your ETL script ran in.
The downloaded etl-output folder will be renamed to etl-output-referenceand the
snapshots folder will be renamed to snapshots-reference.
We recommend using this command to reproduce ETL failures or back test again successful jobs.
Sample
Parameters
| Option | Default | Description |
|---|---|---|
--include-configs | false | When enabled, also downloads target-config.json, source-config.json, tenant-config.json and sets the API_KEY environment variable. |
--overwrite | ||
-o | false | When enabled, overwrites any files that already exist locally in the download to directory. |
--downloadTo | ||
-d | . | The directory to download the job data to. Defaults to the local directory. |
Running the ETL with the jobs environment variables
After running thesetup-local-run command a .env file will be created containing the same environment variables that were available when the job ran in the hotglue environment.In order to run the ETL with those same environment variables use one of the following methods:
For VSCode and it’s variants
Open the launcher file{project_folder}/.vscode/launch.json and add the envFile entry for launch configuration:
For Linux/macOS/Git bash/WSL terminal
Run the following:ETL Local Run
Description
Runs the ETL locally replicating the Hotglue environment and compares theetl-output files with etl-output-reference (the one from the job).
The comparator check for extra or missing files in the etl-output folder
and also compares the matching .csv and .singer files, if they are not the same an error will be shown.
In order to use it you first need to download the Transformation script using the etl download command
and the job data using the etl setup-local-run command.
We recommend using this command to reproduce ETL failures or back test again successful jobs.
Sample where the output matches the output from the job
Sample where the output doesn’t match the output from the job
Parameters
| Option | Default | Description |
|---|---|---|
--etlScriptFolder | . | ETL script folder (downloaded using etl download) |
--jobDataFolder | . | Job data folder (downloaded using etl setup-local-run) |
--dockerPlatform | | Docker platform (linux/amd64, linux/arm64, etc.), leave empty to use the default platform of the docker daemon |
Output comparator options
The ETL file comparator has some options that can be set by creating atest-config.json file in the Script folder.
The options are listed below.
1. sort_config
Specifies how rows in a stream and nested fields within rows should be sorted. Supports flat fields, nested fields, and lists of scalars.
- Flat Field Sorting: Specifies the column used to sort the rows of a stream.
- Nested Field Sorting: Uses dot notation to sort lists of dictionaries within a row.
- List of Scalars Sorting: Uses a trailing
.to sort lists of scalars within a row.
2. ignore_columns
Specifies fields to ignore during the comparison. Supports flat fields and nested fields using dot notation.
- Flat Fields: Directly removes the specified field from rows.
- Nested Fields: Removes specified fields within nested structures using dot notation.
3. rename_config
Specifies fields to rename in the etl-output only. Supports flat and nested fields using dot notation.
- Flat Fields: Renames top-level fields in rows.
- Nested Fields: Renames fields within nested structures using dot notation.