Background
A tap reads and processes data from an API or Database. This involves two separate processes:- Discovery: Generating a catalog of the available streams (aka tables), including metadata such as primary keys and available fields
- Syncing: Reading data from streams and writing the resulting records according to the Singer spec
Setting up a local tap environment
Virtual Environments
Dependencies can differ from tap to tap, so it is best to use a virtual environment to isolate the tap’s Python dependencies. You can create a virtual environment named.venv
in your tap workspace with:
Dependency File | Command |
---|---|
requirements.txt | pip install -r requirements.txt |
setup.py | pip install -e . |
pyproject.toml | pip install -e . |
Config
Both the Discover and Sync processes require a valid config, which will include fields like API keys, OAuth credentials, and configuration flags. For example:
💡 To keep your workspace clean, create a.secrets
folder to store yourconfig.json
. Run all tap commands out of this folder to make sure your catalogs and output data stay separate from relevant code.
Running a Discover
Within your virtual environment, you can invoke the tap from your command line. For example, if your tap was calledtap-Salesforce
, you could run a discover with:
catalog.json
with information about the available streams and fields:
Selecting Streams and Fields
The next step of the syncing process is to select which streams and fields you want to sync. We recommend doing this with the Singer-Discover Command Line Utility.catalog-selected.json
with:

catalog-selected.json
that includes your selection metadata.
Running a Sync
You can now sync the streams selected in yourcatalog-selected.json
. If your tap is named tap-Salesforce
, you could run a sync with:
data.txt
file according to the Singer spec.
If you prefer to work with VSCode’s Debugger, you can run a sync with the following configuration: