Due to storage and efficiency drawbacks with Singer data, hotglue supports automatically converting the singer output of a standard tap into Parquet, which is more digestible for transformation scripts. See intermediary formats for more information.

target-parquet

target-parquet is a singer target which takes in singer data and outputs the data’s corresponding parquet. When developing locally with the SingerSDK, we recommend using target-parquet to easily inspect your data.

Installation

To install target-parquet, first create a virtual environment. We recommend creating this environment globally, so that you can access it everywhere:
python -m venv ~/env/target-parquet
Next, enter that virtual environment:
source ~/env/target-parquet/bin/activate
Next, install target-parquet in your virtual environment:
pip install git+https://github.com/hotgluexyz/target-parquet.git

Usage

Suppose you have a file in your local directory called data.singer that you want to convert. First enter your virtual environment:
source ~/env/target-parquet/bin/activate
Then simply run:
cat data.singer | target-parquet
If your singer data is malformed, you should see an error message explaining the issue. If the Parquet conversion succeeds, you should see a Parquet file for each stream in your singer data.