> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hotglue.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction to Gluestick

> A Python library for efficient ETL processes, optimized for hotglue

Gluestick is an open source ETL toolkit developed and maintained by hotglue. It is optimized for usage in hotglue pre-processing scripts.

The code is available on GitHub and is free to use under the MIT license.

<Card title="GitHub Repository" icon="github" href="https://github.com/hotgluexyz/gluestick">
  Access the complete source code, contribute, or report issues through our GitHub repository. Star us to show support!
</Card>

## Getting started with gluestick

```bash theme={null}
# Install from PyPI
pip install gluestick

# Import utilities
import gluestick as gs
```

## Key Features

* Robust ETL utilities for data processing
* Singer protocol integration
* Advanced JSON and object handling
* Snapshot management for incremental loads
* Production-ready error handling

## File Reading Functions

<CardGroup cols={2}>
  <Card title="Reader" icon="book-open" href="/transformation/gluestick/read-files#reader-recommended">
    RECOMMENDED: Class for reading sync-output data. Provides methods to read directories, get file metadata, and extract primary keys from parquet files.
  </Card>

  <Card title="read_csv_folder" icon="file-csv" href="/transformation/gluestick/read-files#read-csv-folder">
    Reads multiple CSV files from a directory, organizing them by entity type based on filename. Supports custom converters and index columns per entity.
  </Card>

  <Card title="read_parquet_folder" icon="file" href="/transformation/gluestick/read-files#read-parquet-folder">
    Similar to read\_csv\_folder but for Parquet files. Automatically organizes files by entity type and supports ignoring specific files.
  </Card>

  <Card title="get_catalog_schema" icon="book" href="/transformation/gluestick/read-files#get-catalog-schema">
    Retrieves DataFrame schema from Singer catalog.
  </Card>
</CardGroup>

## Snapshot Management

<CardGroup cols={2}>
  <Card title="snapshot_records" icon="camera" href="/transformation/gluestick/snapshots#snapshot-records">
    Manages data snapshots by updating existing snapshots or creating new ones. Supports type coercion and handles both CSV and Parquet formats.
  </Card>

  <Card title="read_snapshots" icon="clone" href="/transformation/gluestick/snapshots#read-snapshots">
    Reads snapshot data for a specific stream from either Parquet or CSV format. Supports additional pandas read options.
  </Card>

  <Card title="drop_redundant" icon="broom" href="/transformation/gluestick/snapshots#drop-redundant">
    Removes duplicate rows based on content hashing. Maintains state of processed data and supports update tracking.
  </Card>
</CardGroup>

## JSON & Object Handling

<CardGroup cols={2}>
  <Card title="json_tuple_to_cols" icon="arrows-split-up-and-left" href="/transformation/gluestick/json#json-tuple-to-cols">
    Converts JSON tuple columns into separate columns based on key-value pairs.
  </Card>

  <Card title="explode_json_to_rows" icon="table-rows" href="/transformation/gluestick/json#explode-json-to-rows">
    Explodes array of objects into multiple rows with columns for each object key.
  </Card>

  <Card title="explode_json_to_cols" icon="table-columns" href="/transformation/gluestick/json#explode-json-to-cols">
    Converts JSON array columns into separate columns, with one column per array value.
  </Card>

  <Card title="compress_rows_to_col" icon="compress" href="/transformation/gluestick/json#compress-rows-to-col">
    Compresses exploded rows back into a single column with array data.
  </Card>

  <Card title="array_to_dict_reducer" icon="arrow-down-to-square" href="/transformation/gluestick/json#array-to-dict-reducer">
    Converts arrays into dictionaries using specified key-value properties.
  </Card>

  <Card title="clean_obj_null_values" icon="eraser" href="/transformation/gluestick/json#clean-obj-null-values">
    Replaces null values with None in stringified objects for further processing.
  </Card>
</CardGroup>

## Data Transformation

<CardGroup cols={2}>
  <Card title="clean_convert" icon="broom" href="/transformation/gluestick/data-transformation#clean-convert">
    Recursively cleans None values from lists and dictionaries. Handles nested structures and datetime conversions.
  </Card>

  <Card title="map_fields" icon="arrows-split-up-and-left" href="/transformation/gluestick/data-transformation#map-fields">
    Maps row values according to a specified mapping dictionary. Supports nested structures and conditional mapping.
  </Card>

  <Card title="rename" icon="pen" href="/transformation/gluestick/data-transformation#rename">
    Renames DataFrame columns using JSON format with type conversion support.
  </Card>

  <Card title="localize_datetime" icon="clock" href="/transformation/gluestick/data-transformation#localize-datetime">
    Localizes datetime columns to UTC timezone. Handles both naive and timezone-aware timestamps.
  </Card>

  <Card title="deep_convert_datetimes" icon="calendar" href="/transformation/gluestick/data-transformation#deep-convert-datetimes">
    Transforms all nested datetimes to ISO format recursively.
  </Card>

  <Card title="exception" icon="triangle-exclamation" href="/transformation/gluestick/data-transformation#exception">
    Standardized error handling with file logging. Creates consistent error reporting across ETL pipelines.
  </Card>
</CardGroup>

## Data Export & Error Handling

<CardGroup cols={2}>
  <Card title="to_export" icon="file-export" href="/transformation/gluestick/export#to-export">
    Exports data to various formats (Singer, Parquet, JSON, JSONL, CSV). Supports schema validation, object stringification, and custom formatting.
  </Card>

  <Card title="to_singer" icon="music" href="/transformation/gluestick/export#to-singer">
    Exports DataFrame to Singer format with schema validation and type handling.
  </Card>
</CardGroup>
