to_export

Recommended: Universal export function supporting multiple output formats with schema validation and customization options.

Installation

import gluestick as gs

Basic Usage

# Basic export to default format (Singer)
gs.to_export(df, "users", OUTPUT_DIR, keys=['id'])

# Export to specific format
gs.to_export(
    df,
    "users",
    OUTPUT_DIR,
    export_format="parquet",
    stringify_objects=True
)

Parameters

  • data (pd.DataFrame): DataFrame to export
  • name (str): Output file name/stream name
  • output_dir (str): Directory for output files
  • keys (list): Primary key fields
  • unified_model (pydantic.BaseModel): Pydantic model for schema validation
  • export_format (str): Output format (‘singer’, ‘parquet’, ‘json’, ‘jsonl’, ‘csv’)
  • output_file_prefix (str): Optional prefix for output files
  • schema (dict): Custom schema for Singer format
  • stringify_objects (bool): Convert complex objects to strings for Parquet

Returns

None (writes files to specified directory)

Notes

  • Uses environment variables for format defaults
  • Supports prefix override per stream
  • Handles complex data types appropriately per format
  • Validates against Pydantic models when provided

to_singer

Recommended: Specialized function for exporting data to Singer format with comprehensive type handling.

Usage

import gluestick as gs

# Basic Singer export
gs.to_singer(df, "users", OUTPUT_DIR, keys=['id'])

# With schema validation
gs.to_singer(
    df,
    "users",
    OUTPUT_DIR,
    keys=['id'],
    allow_objects=True,
    schema=custom_schema
)

Parameters

  • df (pd.DataFrame): DataFrame to export
  • stream (str): Singer stream name
  • output_dir (str): Output directory
  • keys (list): Primary key fields
  • filename (str): Output filename (default: ‘data.singer’)
  • allow_objects (bool): Enable complex object handling
  • schema (dict): Custom schema definition
  • unified_model (pydantic.BaseModel): Pydantic model for validation

Notes

  • Handles datetime conversions automatically
  • Supports catalog-based schemas
  • Validates data against provided schemas
  • Creates Singer-compliant output files

Common Patterns

Handling Complex Exports

import gluestick as gs

# Multiple format export
def multi_format_export(df, name):
    for format in ['singer', 'parquet', 'json']:
        try:
            gs.to_export(
                df,
                name,
                OUTPUT_DIR,
                export_format=format
            )
        except Exception as e:
            gs.exception(
                e,
                ROOT_DIR,
                f"Failed {format} export for {name}"
            )