Jobs
Intermediate Formats
hotglue intermediate formats explained
Though most hotglue compatible taps write data following the Singer spec, storing and processing Singer data can be inefficient.
To solve this, hotglue supports transforming your sync output into a more efficient storage format. Your ETL script processes the data from this format.
hotglue supports four Intermediate Formats:
Format | Description |
---|---|
Singer | Does not transform synced Singer data. This can speed up your jobs by avoiding the intermediate transformation, but is much less storage efficient and is harder to work with in ETL |
CSV | Comma Separated Values. Larger file sizes than Parquet but offers more type flexibility |
Parquet | The Apache Parquet format. More storage efficient than CSV and offers strict type validation |
Parquet With Chunking | Writes Apache Parquet format in chunks, and compiles them together before ETL. |