Overview
What are transformation scripts?
hotglue features a transformation layer that can be used to format the raw output from connectors into a format ingestible by your backend (or wherever you’re piping the data to).
In hotglue, all transformation scripts are written in Python and can use any open source Python modules you’d like. Since hotglue formats output from data support as CSV files, our sample scripts leverage our gluestick package and pandas heavily.
If you do not deploy your script, your changes will be saved but not used. This is so you don’t accidentally break your integration while you’re editing it. To learn how to deploy a script, read Deploy a script.
Start the JupyterLab workspace
To edit the transformation script for a connector, you can start a JupyterLab workspace directly from hotglue.
Start by opening the settings for the connector you’d like to update. From here, launch the JupyterLab workspace by selecting the Python icon:
Generally, when updating the transformation script you should launch the JupyterLab workspace from the admin view – not the tenant view. Launching JupyterLab as a tenant create a custom forked script for that tenant.
Connector configuration screen
hotglue will provision a hosted JupyterLab workspace for you to connect to – this may take a few minutes. Once the JupyterLab workspace is provisioned, click the launch button to open your workspace.
Connect to JupyterLab workspace
You’ve launched JupyterLab and should see something like the below! 🎉
JupyterLab
Note that JupyterLab workspaces will timeout after 30 minutes of inactivity, at which point you’ll need to start a new workspace. When your session times out, your script is autosaved.
Define dependencies
The requirements.txt
file in the root directory is where you can specify any Python modules you wish to use as dependencies. In this example, my requirements.txt
contains the following:
As mentioned above the gluestick package is developed by the hotglue team and provides utility functions for reading and manipulating CSV files with pandas. Read the wiki on GitHub.
The requests package can be used to make API requests directly from your transformation script. See the testing requirements.txt
we created in Jupyter below:
Edit the script
Inside the etl
folder, you will find a file titled etl.ipynb
containing a default transformation script
Default transformation script
You can use this as a base to understand how to read data with the gluestick
package and manipulate it using pandas
. You can also find more sample scripts available on GitHub.
Deploy the script
Once you have written your transformation script, you must deploy it using the hotglue tab in Jupyter.
Deploy the transformation script
Jupyter will prompt you to confirm the deployment as pictured below:
Confirm transformation script deployment
If you do not deploy your script, your changes will be autosaved but will not be used. Autosaved transformation scripts can be found in the cached
folder.
Note that the CLI only downloads the deployed version of the script. If you have a script that is not deployed, you will need to load a Jupyter notebook to deploy your cached edits