Create taps with the SDK requires overriding just two or three classes:
- The
Tap
class. This class governs configuration, validation, and stream discovery. - The stream class. You have different options for your base class depending on the type
of data source you are working with:
Stream
- The generic base class for streams.RESTStream
- The base class for REST-type streams.GraphQLStream
- The base class for GraphQL-type streams. This class inherits fromRESTStream
, since GraphQL is built upon REST.
- An optional authenticator class. You can omit this class entirely if you do not require authentication or if you prefer to write custom authentication logic. The supported authenticator classes are:
SimpleAuthenticator
- This class is functionally equivalent to overridinghttp_headers
property in the stream class.OAuthAuthenticator
- This class performs an OAuth 2.0 authentication flow.OAuthJWTAuthenticator
- This class performs an JWT (JSON Web Token) authentication flow.
Create targets with the SDK requires overriding just two classes:
- The
Target
class. This class governs configuration, validation, and stream discovery. - The
Sink
class. You have two different options depending on whether your target prefers writing one record at a time versus writing in batches:RecordSink
writes one record at a time, via theprocess_record()
method.BatchSink
writes one batch at a time. Important class members include:start_batch()
to (optionally) initialize a new batch.process_record()
to enqueue a record to be written.process_batch()
to write any queued records and cleanup local resources.
Note: The Sink
class can receive records from one stream or from many. See the Sink documentation
for more information on differences between a target's Sink
class versus a tap's Stream
class.
First, install cookiecutter, Poetry, and optionally Tox:
# Install pipx if you haven't already
pip install pipx
pipx ensurepath
# Restart your terminal here, if needed, to get the updated PATH
pipx install cookiecutter
pipx install poetry
# Optional: Install Tox if you want to use it to run auto-formatters, linters, tests, etc.
pipx install tox
Now you can initialize your new project with the Cookiecutter template for taps:
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/tap-template"
...or for targets:
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/target-template"
Note that you do not need to create the directory for the tap.
If you want want /projects/tap-mytap
, then run the cookiecutter in /projects
and the tap-mytap
project will be created.
Once you've answered the cookiecutter prompts, follow the instructions in the
generated README.md
file to complete your new tap or target. You can also reference the
Meltano Tutorial for a more
detailed guide.
In some cases, there may already be a library that connects to the API and all you need the SDK for is to reformat the data into the Singer specification. The SDK is still a great choice for this. The Peloton tap is an example of this.
By default, the Singer SDK for REST streams assumes the API responds with a JSON array of records, but you can easily override this behaviour by specifying the records_jsonpath
expression in your RESTStream
or GraphQLStream
implementation:
class EntityStream(RESTStream):
"""Entity stream from a generic REST API."""
records_jsonpath = "$.data.records[*]"
You can test your JSONPath expressions with the JSONPath Online Evaluator.
Many APIs return the records in an array nested inside an JSON object key.
-
Response:
{ "data": { "records": [ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ] } }
-
Expression:
$.data.records[*]
-
Result:
[ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ]
Some APIs instead return the records as values inside an object where each key is some form of identifier.
-
Response:
{ "data": { "1": { "id": 1, "value": "abc" }, "2": { "id": 2, "value": "def" } } }
-
Expression:
$.data.*
-
Result:
[ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ]
For a detailed reference, please see the SDK Reference Guide
For more information about the SDK's' Singer implementation details, please see the SDK Implementation Details section.
For a list of code samples solving a variety of different scenarios, please see our Code Samples page.
For a list of sample CLI commands you can run, click here.
We've collected some Python tips which may be helpful for new SDK users.
Ensure the intrepreter you're using in VSCode is set to use poetry. You can change this by using the command pallete to go to intrepeter settings. Doing this will also help with autocompletion.
In order to launch your plugin via it's CLI with the built-in debugger, VSCode requires a Launch configuration.
An example launch configuration, added to your launch.json
, might be as follows:
{
// launch.json
"version": "0.2.0",
"configurations": [
{
"name": "tap-snowflake discovery",
"type": "python",
"request": "launch",
"module": "tap_snowflake.tap",
"args": ["--config", "config.json", "--discover"],
"python": "${command:python.interpreterPath}",
// Set to true to debug third-party library code
"justMyCode": false,
}
]
}
The above module
value relies on an equivalent to the following snippet being added to the end of your tap.py
or target.py
file:
if __name__ == "__main__":
TapSnowflake.cli()
This is automatically included in the most recent version of the tap and target cookiecutters.
We've had success using viztracer
to create flame graphs for SDK-based packages and find if there are any serious performance bottlenecks.
You can start doing the same in your package. Start by installing viztracer
.
$ poetry add --dev viztracer
Then simply run your package's CLI as normal, preceded by the viztracer
command
$ poetry run viztracer my-tap
That command will produce a result.json
file which you can explore with the vizviewer
tool.
$ poetry run vizviewer result.json
Thet output should look like this
Note: Chrome seems to work best for running the vizviewer
app.