Skip to main content

Background

A tap reads and processes data from an API or Database. This involves two separate processes:
  1. Discovery: Generating a catalog of the available streams (aka tables), including metadata such as primary keys and available fields
  2. Syncing: Reading data from streams and writing the resulting records according to the Singer spec

Setting up a local tap environment

Virtual Environments

Dependencies can differ from tap to tap, so it is best to use a virtual environment to isolate the tap’s Python dependencies. You can create a virtual environment named .venv in your tap workspace with:
python -m venv .venv
You can then enter the virtual environment with:
source .venv/bin/activate
Finally, you can install the tap’s dependencies depending on how they are specified:
Dependency FileCommand
requirements.txtpip install -r requirements.txt
setup.pypip install -e .
pyproject.tomlpip install -e .

Config

Both the Discover and Sync processes require a valid config, which will include fields like API keys, OAuth credentials, and configuration flags. For example:
{
    "client_id": "...",
    "client_secret": "...",
    "refresh_token": "...",
    "access_token": "...",
    "start_date": "2025-01-25T00:00:00Z",
}
Most taps will include information about the required config values in their README’s. If you want to try to reproduce the sync output of a particular Hotglue job, you can download the job folder on your Hotglue job page:
💡 To keep your workspace clean, create a .secrets folder to store your config.json. Run all tap commands out of this folder to make sure your catalogs and output data stay separate from relevant code.

Using the Hotglue Access Token Endpoint

For OAuth connectors, you can have the tap use the access token endpoint instead of refreshing the token locally by: 1. Adding this flag to your config.json:
{
    "_refresh_token_via_hg_api": true,
    ...
}
2. Setting the following environment variables:
VariableDescription
TENANTYour tenant ID
API_KEYYour Hotglue API key
FLOWThe flow ID
ENV_IDThe environment ID
TAPThe tap ID (e.g. exact)
TENANT="<tenant id>"
API_KEY="<hotglue api key>"
FLOW="<flow id>"
ENV_ID="<environment id>"
TAP="<tap id>"

Running a Discover

Within your virtual environment, you can invoke the tap from your command line. For example, if your tap was called tap-Salesforce, you could run a discover with:
tap-salesforce --config PATH_TO_CONFIG/config.json --discover > catalog.json
If the discover runs successfully, this will generate a file called catalog.json with information about the available streams and fields:
{
  "streams": [
    {
      "tap_stream_id": "...",
      "replication_key": "LastModifiedDate",
      "replication_method": "INCREMENTAL",
      "key_properties": [
        "ID"
      ],
      "schema": {
        ....
      }
    }
    ...
  ]
}
If you prefer to work with VSCode’s Debugger, you can run a discover with the following configuration:
{
    "name": "tap discover",
    "type": "python",
    "request": "launch",
    "program": ".../PATH_TO_ENTRY_SCRIPT", // e.g., .../tap_salesforce/tap.py
    "cwd": "ABSOLUTE_PATH_TO_DIRECTORY_TO_RUN_COMMAND_FROM",
    "args": [
        "--config",
        "config.json",
        "--discover",
        ">",
        "catalog.json"
    ],
    "justMyCode": false,
    "python": "ABSOLUTE_PATH_TO_TAP_DIRECTORY/.venv/bin/python",
    "console": "integratedTerminal"
}

Selecting Streams and Fields

The next step of the syncing process is to select which streams and fields you want to sync. We recommend doing this with the Singer-Discover Command Line Utility.
pip install https://github.com/hotgluexyz/singer-discover/archive/master.zip
Once installed, you can generate a catalog-selected.json with:
singer-discover --input catalog.json --output catalog-selected.json
You will then be prompted to select your streams This will generate a new catalog called catalog-selected.json that includes your selection metadata.

Running a Sync

You can now sync the streams selected in your catalog-selected.json. If your tap is named tap-Salesforce, you could run a sync with:
tap-salesforce --config config.json --catalog catalog-selected.json > data.singer
This will write the streams to a data.singer file according to the Singer spec. If you prefer to work with VSCode’s Debugger, you can run a sync with the following configuration:
{
    "name": "tap sync",
    "type": "python",
    "request": "launch",
    "program": ".../PATH_TO_ENTRY_SCRIPT", // e.g., .../tap_salesforce/tap.py
    "cwd": "ABSOLUTE_PATH_TO_DIRECTORY_TO_RUN_COMMAND_FROM",
    "args": [
        "--config",
        "config.json",
        "--catalog",
        "catalog-selected.json",
        ">",
        "data.singer"
    ],
    "justMyCode": false,
    "python": "ABSOLUTE_PATH_TO_TAP_DIRECTORY/.venv/bin/python",
    "console": "integratedTerminal"
}