Building targets
Learn how to build custom targets using the Singer SDK
Background
A target processes and writes data to an API or Database. It includes the logic to authenticate, handle errors and output the summary of the processed records.
Initial Setup
Install the cookiecutter
Python package via pip
Download the target hotglue sdk by running this command
Fill destination_name
with the target name (capitalized), the author name, select “Per record” as the serialization method, and choose the authorization method (OAuth2 or Bearer token).
Enter the the newly created folder and install the package locally in editable mode, so your changes are reflected immediately for testing
Target structure
The folders and file structure of the target should look like this:
target.py
The target class is the entry point of the target execution. Target name, available sinks and config fields should be defined here.
Define the config.json schema
In target.py
define all required values in the config file, such as username, password, api_key, client_id, etc
List the available sinks
Import all sinks (Check below to see how to create a sink)
client.py
client.py
contains all the common functions and properties, such as the api base url and the authentication logic within a base sink.
You can overwrite any of the prebuilt functions here and they will be applied to all sinks
The base sink should be a child class from either of these Hotglue sinks:
- For Per record serialization method it should inherit from
HotglueSink
- For Per batch serialization method it should inherit from
HotglueBatchSink
Common Attributes
base_url
: The base url of the APIhttp_headers
: Any header that should be sent when making the request.MAX_SIZE_DEFAULT
: When making a target with per batch serialization methos, this attribute defines the amount of records that will be processed and passed into themake_batch_request
function.
Authentication
The authentication should be defined in the base sink. If the logic is simple could be added here as well and in cases like OAuth is a good practice to create an auth.py file containing the logic.
These are some of the common authentication methods used and how they can be added to the base sink.
Basic Authentication
The basic auth logic can be added as a property or function in the base sink and passed in the headers.
API Key authentication
The Api Key can be passed directly in the headers.
OAuth Authentication
In the case of OAuth an Authentication class can be created in a separate file. And then imported into the base sink.
auth.py
This class will be responsible to generate an access token to make requests to the API. The auth class should be called in the base sink class.
Notes
- In case of OAuth with refresh token the target will only handle the logic to refresh the token, the authorization process should be done before and the refresh_token should be already in the config file.
- In hotglue, the authorization process is handled out of the box.
Main properties
-
auth_headers
: Inside we can validate if the token is valid, call the function to request a new token and structure the access token. It should return a dict with the Authorization key and the access_token to be sent in the requests made to the API. -
oauth_request_body
: It builds the payload to be sent when making the request to obtain a new access token.
Main functions
-
is_token_valid
: It validates if the token is still valid or not to be used. It should return a boolean. -
update_access_token
: It makes a request to generate a new access token.
In cases of rotating refresh tokens or if we need to save the new access token in the config file, we need to add an init statement in the Auth class, Target class and the Base Sink class.
Sample auth.py
sinks.py
Create a sink per each endpoint in sinks.py
Main functions
Most of the target logic is defined in target-hotglue
sdk, in most cases only 2 functions logic need to be defined.
-
preprocess_record
: This functions reads each record from thedata.singer
and builds the payload that will be sent to the API. Here is where data can be mapped or customized logic to build the payload can be added.
It should return the payload to be sent to the API. -
upsert_record
: It receives the payload returned bypreprocess_record
function and it sends it to the API.
It should return id, status, state.- id: the API provided id of the created or updated record
- status: a boolean that tells us if the record was sent succesfully,
- state: data to be written in the
target-state.json
commonly an empty dict.
If the record was updated instead of created we can add it to the state:
Note: any exception or error in the process_record
function won’t stop the target, the error will be written in the target-state.json
file together with the externalId
provided in the data.singer
, and the target will continue and read the next record.
Samples
You can add the method preprocess_record
to the ContactsSink class, this will take the input data and transform it into the payload we pass to the endpoint
Add the method upsert_record
to the ContactsSink class, here the payload created from preprocess_record
will be sent to the api
target-state.json
The target state stores the bookmarks which is an array of all the outputs that come from upsert_record
.
It holds the id of the newly created or updated records, or if an error ocurred it contains the reason of the error and the externalId
data to recognize which record has failed.
It stores the summary which is a count of 4 types of records:
- success: Count of created and updated records, if updated is not defined in the
upsert_record
function. - failed: Count of records that failed to be created or updated.
- existing: Count of any dupplicated records that were read, the target uses the
externalId
to determine this. - updated: Count ofupdated records, this counts only works if defined in
upsert_record
.
Example:
Testing the target locally
Input Files
config.json
This file stores all the credentials needed to send data to the API. Flags or additional data can be saved here for customized logic
Example:
data.singer
All Singer SDK targets accept a jsonl
(JSON lines) input, we typically call it data.singer
which should be Singer formatted records.
Note: Each record should have an externalId
field, this will serve as a primary key to update the target-state if errors happen.
Example:
Running the target
Now to run the target you can do this:
You can also set this up a debug config in vscode in the launch.json
by setting a definition that looks like this: