What are transformation scripts?

hotglue features a transformation layer that can be used to format the raw output from data sources into a format ingestible by your backend (or the targets you're piping the data to).

In hotglue, all transformation scripts are written in Python and can use any open source Python modules you'd like. Since hotglue formats output from data support as CSV files, our sample scripts leverage our gluestick package and pandas heavily.

Start the JupyterLab workspace

To edit the transformation script for a data source, you can start a JupyterLab workspace directly from hotglue.

📘

Note

Generally, when updating the transformation script you should launch the JupyterLab workspace from the admin view – not the tenant view. Launching JupyterLab as a tenant create a custom forked script for that tenant.

Start by opening the settings for the source you'd like to update:

30683068

Open source settings

From here, launch the JupyterLab workspace by selecting the Python icon:

30683068

Launch JupyterLab workspace

hotglue will provision a hosted JupyterLab workspace for you to connect to – this may take a few minutes. Note that JupyterLab workspaces will timeout after 30 minutes of inactivity, at which point you'll need to start a new workspace.

Once the JupyterLab workspace is provisioned, you can connect:

30683068

Connect to JupyterLab workspace

You've launched JupyterLab and should see something like the below! :tada:

30683068

JupyterLab

Define dependencies

The requirements.txt file in the root directory is where you can specify any Python modules you wish to use as dependencies. In this example, my requirements.txt contains the following:

gluestick==1.0.4
numpy==1.16.2
pandas==0.25.3
requests==2.24.0

As mentioned above the gluestick package is developed by the hotglue team and provides utility functions for reading and manipulating CSV files with pandas. Read the wiki on GitHub.

The requests package can be used to make API requests directly from your transformation script. See the testing requirements.txt we created in Jupyter below:

30683068

Edit the script

Inside the etl folder, you will find a file titled etl.ipynb containing a default transformation script

30683068

Default transformation script

You can use this as a base to understand how to read data with the gluestick package and manipulate it using pandas. You can also find more sample scripts available on GitHub.

Deploy the script

Once you have written your transformation script, you must deploy it using the hotglue tab in Jupyter.

30683068

Deploy the transformation script

Jupyter will prompt you to confirm the deployment as pictured below:

30683068

Confirm transformation script deployment


Did this page help you?