Reader
Recommended: A class for reading ETL files from the sync-output directory. This is the recommended way to read ETL files as it provides consistent error handling and metadata access.Installation
Basic Usage
Key Methods
get(stream, default=None, catalog_types=False, **kwargs)
Reads the selected file into a pandas DataFrame.get_metadata(stream)
Retrieves metadata from parquet files.get_pk(stream)
Gets primary key(s) from parquet file metadata.read_csv_folder
Legacy function for reading multiple CSV files. Consider using Reader class instead.Usage
Parameters
path
(str): Directory containing CSV files or path to single CSV fileconverters
(dict): Entity-specific column convertersindex_cols
(dict): Entity-specific index columnsignore
(list): Files to ignore
Returns
Dictionary of pandas DataFrames, keyed by entity nameread_parquet_folder
Legacy function for reading multiple parquet files. Consider using Reader class instead.Usage
Parameters
path
(str): Directory containing parquet files or path to single fileignore
(list): Files to ignore
Returns
Dictionary of pandas DataFrames, keyed by entity nameget_catalog_schema
Retrieves DataFrame schema from Singer catalog file.Usage
Returns
Dictionary containing the stream’s schema definitionNotes
- Requires
catalog.json
in root directory - Raises exception if stream not found in catalog
- Filters schema to include only type and properties
- Ensures array types have items dictionary
Common Patterns
Iterating through multiple streams
etl.py