Building taps
Learn how to run custom tap using the Singer SDK
Background
A tap reads and processes data from an API or Database. This involves two separate processes:
-
Discovery: Generating a catalog of the available streams (aka tables), including metadata such as primary keys and available fields
-
Syncing: Reading data from streams and writing the resulting records according to the Singer spec
Setting up a local tap environment
Virtual Environments
Dependencies can differ from tap to tap, so it is best to use a virtual environment to isolate the tap’s Python dependencies.
You can create a virtual environment named .venv
in your tap workspace with:
You can then enter the virtual environment with:
Finally, you can install the tap’s dependencies depending on how they are specified:
Dependency Files | Command |
---|---|
requirements.txt | pip install -r requirements.txt |
setup.py | pip install -e . |
pyproject.toml | pip install -e . |
Config
Both the Discover and Sync processes require a valid config, which will include fields like API keys, OAuth credentials, and configuration flags.
For example:
Most taps will include information about the required config values in their README’s.
If you want to try to reproduce the sync output of a particular Hotglue job, you can download the job folder on your Hotglue job page:
💡 To keep your workspace clean, create a
.secrets
folder to store yourconfig.json
. Run all tap commands out of this folder to make sure your catalogs and output data stay separate from relevant code.
Running a Discover
Within your virtual environment, you can invoke the tap from your command line.
For example, if your tap was called tap-Salesforce
, you could run a discover with:
If the discover runs successfully, this will generate a file called catalog.json
with information about the available streams and fields:
If you prefer to work with VSCode’s Debugger, you can run a discover with the following configuration:
Selecting Streams and Fields
The next step of the syncing process is to select which streams and fields you want to sync.
We recommend doing this with the Singer-Discover Command Line Utility.
Once installed, you can generate a catalog-selected.json
with:
You will then be prompted to select your streams
This will generate a new catalog called catalog-selected.json
that includes your selection metadata.
Running a Sync
You can now sync the streams selected in your catalog-selected.json
. If your tap is named tap-Salesforce
, you could run a sync with:
This will write the streams to a data.txt
file according to the Singer spec.
If you prefer to work with VSCode’s Debugger, you can run a sync with the following configuration: