Gluestick
Data Export & Error Handling
Functions for exporting data to various formats and handling errors in ETL pipelines
to_export
Recommended: Universal export function supporting multiple output formats with schema validation and customization options.
Installation
Basic Usage
Parameters
data
(pd.DataFrame): DataFrame to exportname
(str): Output file name/stream nameoutput_dir
(str): Directory for output fileskeys
(list): Primary key fieldsunified_model
(pydantic.BaseModel): Pydantic model for schema validationexport_format
(str): Output format (‘singer’, ‘parquet’, ‘json’, ‘jsonl’, ‘csv’)output_file_prefix
(str): Optional prefix for output filesschema
(dict): Custom schema for Singer formatstringify_objects
(bool): Convert complex objects to strings for Parquet
Returns
None (writes files to specified directory)
Notes
- Uses environment variables for format defaults
- Supports prefix override per stream
- Handles complex data types appropriately per format
- Validates against Pydantic models when provided
to_singer
Recommended: Specialized function for exporting data to Singer format with comprehensive type handling.
Usage
Parameters
df
(pd.DataFrame): DataFrame to exportstream
(str): Singer stream nameoutput_dir
(str): Output directorykeys
(list): Primary key fieldsfilename
(str): Output filename (default: ‘data.singer’)allow_objects
(bool): Enable complex object handlingschema
(dict): Custom schema definitionunified_model
(pydantic.BaseModel): Pydantic model for validation
Notes
- Handles datetime conversions automatically
- Supports catalog-based schemas
- Validates data against provided schemas
- Creates Singer-compliant output files