> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hotglue.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Reading Functions

## Reader

**Recommended**: A class for reading ETL files from the sync-output directory. This is the recommended way to read ETL files as it provides consistent error handling and metadata access.

### Installation

```python theme={null}
from gluestick.reader import Reader
```

### Basic Usage

```python theme={null}
# Initialize with default directories
reader = Reader()  # Uses ROOT_DIR/sync-output

# Get available streams
print(reader)  # Shows list of available streams
```

### Key Methods

***

#### get(stream, default=None, catalog\_types=False, \*\*kwargs)

Reads the selected file into a pandas DataFrame.

```python theme={null}
# Basic reading
df = reader.get("users")

# Using catalog types
df = reader.get("users", catalog_types=True)
```

#### get\_metadata(stream)

Retrieves metadata from parquet files.

```python theme={null}
metadata = reader.get_metadata("users")
# Returns dict of metadata key-value pairs
```

***

#### get\_pk(stream)

Gets primary key(s) from parquet file metadata.

```python theme={null}
primary_keys = reader.get_pk("users")
# Returns list of primary key column names
```

***

## get\_catalog\_schema

Retrieves Singer schema from catalog file.

### Usage

```python theme={null}
from gluestick.singer import get_catalog_schema, to_singer


# Get schema for specific stream
schema = get_catalog_schema("users")

# Use schema with Singer export
to_singer(df, "users", OUTPUT_DIR, schema=schema)
```

### Returns

Dictionary containing the stream's schema definition

### Notes

* Requires `catalog.json` in root directory
* Raises exception if stream not found in catalog
* Filters schema to include only type and properties
* Ensures array types have items dictionary

## Common Patterns

### Iterating through multiple streams

```python etl.py theme={null}
import gluestick as gs

# iterate through each stream in the input directory
reader = gs.Reader()
for key in eval(str(reader)):

    # define a dataframe and apply transformations
    input_df = reader.get(key, catalog_types=True)
    input_df["tenant"] = TENANT_ID

    # define the primary keys (assuming the data is Parquet)
    key_properties = reader.get_pk(key)

    # write the data out to the output directory
    gs.to_export(input_df, key, OUTPUT_DIR, keys=key_properties)
```
