> ## Documentation Index > Fetch the complete documentation index at: https://docs.hotglue.com/llms.txt > Use this file to discover all available pages before exploring further. # Introduction to Gluestick > A Python library for efficient ETL processes, optimized for hotglue Gluestick is an open source ETL toolkit developed and maintained by hotglue. It is optimized for usage in hotglue pre-processing scripts. The code is available on GitHub and is free to use under the MIT license. Access the complete source code, contribute, or report issues through our GitHub repository. Star us to show support! ## Getting started with gluestick ```bash theme={null} # Install from PyPI pip install gluestick # Import utilities import gluestick as gs ``` ## Key Features * Robust ETL utilities for data processing * Singer protocol integration * Advanced JSON and object handling * Snapshot management for incremental loads * Production-ready error handling ## File Reading Functions RECOMMENDED: Class for reading sync-output data. Provides methods to read directories, get file metadata, and extract primary keys from parquet files. Reads multiple CSV files from a directory, organizing them by entity type based on filename. Supports custom converters and index columns per entity. Similar to read\_csv\_folder but for Parquet files. Automatically organizes files by entity type and supports ignoring specific files. Retrieves DataFrame schema from Singer catalog. ## Snapshot Management Manages data snapshots by updating existing snapshots or creating new ones. Supports type coercion and handles both CSV and Parquet formats. Reads snapshot data for a specific stream from either Parquet or CSV format. Supports additional pandas read options. Removes duplicate rows based on content hashing. Maintains state of processed data and supports update tracking. ## JSON & Object Handling Converts JSON tuple columns into separate columns based on key-value pairs. Explodes array of objects into multiple rows with columns for each object key. Converts JSON array columns into separate columns, with one column per array value. Compresses exploded rows back into a single column with array data. Converts arrays into dictionaries using specified key-value properties. Replaces null values with None in stringified objects for further processing. ## Data Transformation Recursively cleans None values from lists and dictionaries. Handles nested structures and datetime conversions. Maps row values according to a specified mapping dictionary. Supports nested structures and conditional mapping. Renames DataFrame columns using JSON format with type conversion support. Localizes datetime columns to UTC timezone. Handles both naive and timezone-aware timestamps. Transforms all nested datetimes to ISO format recursively. Standardized error handling with file logging. Creates consistent error reporting across ETL pipelines. ## Data Export & Error Handling Exports data to various formats (Singer, Parquet, JSON, JSONL, CSV). Supports schema validation, object stringification, and custom formatting. Exports DataFrame to Singer format with schema validation and type handling.