explode_json_to_cols

Converts JSON array columns into separate columns, with one column per array value.

Usage

from gluestick.pandas_utils import explode_json_to_cols

# Basic column explosion
df = explode_json_to_cols(df, 'array_column')

# With options
df = explode_json_to_cols(
    df,
    'array_column',
    drop=True,
    inplace=False
)

Parameters

  • df (pd.DataFrame): Input DataFrame
  • column_name (str): Column containing JSON arrays
  • drop (bool): Whether to drop original column
  • inplace (bool): Modify DataFrame in place
  • parser (function): Custom parsing function (optional)

Returns

DataFrame with array values expanded into separate columns


explode_json_to_rows

Takes a column containing an array of objects and expands it into multiple rows, creating columns for each object property.

Usage

from gluestick.pandas_utils import explode_json_to_rows

# Basic explosion
df = explode_json_to_rows(df, 'array_column')

# With nesting control
df = explode_json_to_rows(
    df, 
    'array_column',
    drop=True,
    max_level=2
)

Parameters

  • df (pd.DataFrame): Input DataFrame
  • column_name (str): Column containing JSON arrays
  • drop (bool): Whether to drop original column
  • max_level (int): Maximum nesting level for flattening
  • parser (function): Custom parsing function (optional)

Returns

DataFrame with array objects expanded into separate rows


json_tuple_to_cols

Convert JSON tuple columns into separate columns based on key-value pairs. Useful for transforming nested JSON data into a flattened DataFrame structure.

Installation

from gluestick.pandas_utils import json_tuple_to_cols

Basic Usage

# Basic conversion with default settings
df = json_tuple_to_cols(df, 'json_column')

# Custom key-value mapping
config = {
    'cols': {'key_prop': 'CategoryName', 'value_prop': 'CategoryValue'},
    'look_up': {'key_prop': 'name', 'value_prop': 'value'}
}
df = json_tuple_to_cols(df, 'json_column', col_config=config)

Parameters

  • df (pd.DataFrame): Input DataFrame
  • column_name (str): Column containing JSON tuples
  • col_config (dict): Configuration for key-value mapping
    • cols: Output column names
    • look_up: Input property names

Returns

DataFrame with JSON tuple column split into separate columns


compress_rows_to_col

Compresses previously exploded rows back into a single column containing array data.

Usage

from gluestick.pandas_utils import compress_rows_to_col

# Basic compression
df = compress_rows_to_col(df, 'line_items', 'invoice_id')

# With CSV storage
df = compress_rows_to_col(
    df,
    'line_items',
    'invoice_id',
    use_csv=True
)

Parameters

  • df (pd.DataFrame): Input DataFrame
  • column_prefix (str): Prefix of columns to compress
  • pk (str): Primary key for grouping rows
  • use_csv (bool): Use CSV format for storage

Returns

DataFrame with specified columns compressed into array format


array_to_dict_reducer

Creates a reducer function that converts arrays into dictionaries using specified key-value properties.

Usage

from gluestick.pandas_utils import array_to_dict_reducer
from functools import reduce

# Create reducer
reducer = array_to_dict_reducer(
    key_prop='name',
    value_prop='value'
)

# Apply to array
result = reduce(reducer, array_data, {})

Parameters

  • key_prop (str): Property to use as dictionary key
  • value_prop (str): Property to use as dictionary value

Returns

Function that reduces arrays to dictionaries

Notes

  • Returns a reducer function to be used with functools.reduce
  • Raises AttributeError if values aren’t dictionaries
  • Handles both specified key-value pairs and full dictionary merging

clean_obj_null_values

Replaces null values with None in stringified objects for further processing.

Usage

from gluestick.etl_utils import clean_obj_null_values

# Clean stringified object
cleaned = clean_obj_null_values(json_string)

Parameters

  • obj (str): Stringified dictionary/list with null values

Returns

String with ‘null’ values replaced with ‘None’

Notes

  • Returns empty dict () for pandas NA values
  • Particularly useful before using explode functions
  • Preserves original object structure for non-null values

Common Patterns

Processing Nested JSON Data

import gluestick as gs
import pandas as pd

# Example with nested invoice data
def process_invoice_lines(df):
    # Clean nulls first
    df['line_items'] = df['line_items'].apply(gs.clean_obj_null_values)
    
    # Explode to rows
    df_lines = gs.explode_json_to_rows(df, 'line_items')
    
    # Process and transform
    # ... apply transformations ...
    
    # Compress back to column if needed
    df = gs.compress_rows_to_col(df_lines, 'line_items', 'invoice_id')
    
    return df