Reader

A class for reading ETL files from the sync-output directory. This is the recommended way to read ETL files as it provides consistent error handling and metadata access.

Basic Usage

import * as gs from '@hotglue/gluestick-ts';

const reader = new gs.Reader();

Key Methods


get(stream: string, options): DataFrame

Read a specific stream as a Polars DataFrame

Usage

// Basic reading
const df = reader.get("users");

// Type the dataframe according to the catalog
const df = reader.get("users", {
    catalogTypes: true
});

Parameters

  • stream (str): Name of stream to read
  • options (obj)
    • catalogTypes: (boolean) - Use catalog for automatic type inference
    • Other options will be passed through to Polars when reading. See ReadCSV and ReadParquet options for more information

getPk(stream: string): string[]

Gets primary key(s) for a stream from catalog

Usage

const primaryKeys = reader.getPk('users');
console.log(primaryKeys) 
// ["user_id", "region_id"]

Parameters

  • stream (string): Name of stream to get primary keys for

Returns

  • string[]: Array of primary key column names

keys(): string[]

Gets all available stream names from the input files.

Usage

const availableStreams = reader.keys();
console.log(availableStreams);
// ["users", "orders", "products"]

Returns

  • string[]: Array of all available stream names

Common Patterns

Iterating through multiple streams

etl.ts
import * as gs from '@hotglue/gluestick-ts';

// iterate through each stream in the input directory
const input = new gs.Reader();
const availableStreams = input.keys();

for (const key of availableStreams) {
    const inputDf = input.get(key, { catalogTypes: true });
}