Skip to content

Latest commit

 

History

History
272 lines (225 loc) · 9.93 KB

githubaction.md

File metadata and controls

272 lines (225 loc) · 9.93 KB

transformations-cli GitHub Action

transformations-cli also provides a GitHub Action, which can be used to deploy transformations.

  1. Create transformation manifests and place them into a folder structure in your repository.

Transformation manifests

Important notes:

  • When a scheduled transformation is represented in a manifest without schedule provided, deploy will delete the existing schedule.

  • When an existing notification is not provided along with the transformation to be updated, notification will be deleted.

  • Values specified as ${VALUE} are treated as environment variables while VALUE is directly used as the actual value.

  • Old jetfire-cli style manifests can be used by adding legacy: true inside the old manifest.

  • The manifest directory is scanned recursively for *.yml and *.yaml files, so you can organize your transformations into separate subdirectories.

Manifest with OIDC Credentials

# Required
externalId: "test-cli-transform-oidc"
# Required
name: "test-cli-transform-oidc"

# Required
# Valid values are: "assets", "timeseries", "asset_hierarchy", events", "datapoints", 
# "string_datapoints", "sequences", "files", "labels", "relationships",
# "raw", "data_sets", "sequence_rows", "nodes", "edges"
destination: 
  type: "assets"

# destination: "assets"

# When writing to RAW tables, use the following syntax:
# destination:
#   type: raw
#   database: some_database
#   table: some_table

# When writing to sequence rows, use the following syntax:
# destination:
#   type: sequence_rows
#   externalId: some_sequence

# when writing to nodes in your data model, use the following syntax:
# NOTE: view is optional, not needed for writing nodes without a view
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: nodes
#   instanceSpace: InstanceSpace
#   view:  
#     space: TypeSpace
#     externalId: TypeExternalId
#     version: version

# when writing to edges ( aka connection definition) in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: edges
#   instanceSpace: InstanceSpace
#   edgeType:
#     space: TypeSpace
#     externalId: TypeExternalId

# when writing to edges with view in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: edges
#   instanceSpace: InstanceSpace
#   view:
#     space: TypeSpace
#     externalId: TypeExternalId
#     version: version
    
# Optional, default: true
shared: true

# Optional, default: upsert
# Valid values are:
#   upsert: Create new items, or update existing items if their id or externalId
#           already exists.
#   create: Create new items. The transformation will fail if there are id or
#           externalId conflicts.
#   update: Update existing items. The transformation will fail if an id or 
#           externalId does not exist.
#   delete: Delete items by internal id.
action: "upsert"

# Required
query: "select 'My Assets Transformation' as name, 'asset1' as externalId"

# Or the path to a file containing the SQL query for this transformation.
# query:
#   file: query.sql

# Optional, default: null
# If null, the transformation will not be scheduled.
schedule: "* * * * *"
# Or you can pause the schedules.
# schedule:
#   interval: "* * * * *"
#   isPaused: true

# Optional, default: true
ignoreNullFields: false

# Optional, default: null
# List of email adresses to send emails to on transformation errors
notifications:
  - [email protected]
  - [email protected]

# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetId: 1

# Or you can provide data set external ID instead,
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetExternalId: test-dataset

# Optional: You can tag your transformations with max 5 tags.
tags:
  - mytag1
  - mytag2

# The client credentials to be used in the transformation
authentication:
  clientId: ${CLIENT_ID}
  clientSecret: ${CLIENT_SECRET}
  tokenUrl: ${TOKEN_URL}
  scopes: 
    - ${SCOPES}
  cdfProjectName: ${CDF_PROJECT_NAME}
  # audience: ""

# If you need to specify read/write credentials separately
# authentication:
#   read:
#     clientId: ${CLIENT_ID}
#     clientSecret: ${CLIENT_SECRET}
#     tokenUrl: ${TOKEN_URL}
#     scopes: 
#       - ${SCOPES}
#     cdfProjectName: ${CDF_PROJECT_NAME}
#     # audience: ""
#   write:
#     clientId: ${CLIENT_ID}
#     clientSecret: ${CLIENT_SECRET}
#     tokenUrl: ${TOKEN_URL}
#     scopes: 
#       - ${SCOPES}
#     cdfProjectName: ${CDF_PROJECT_NAME}
#     # audience: ""

Manifest with API keys

   externalId: "test-cli-transform"
   name: "test-cli-transform"
   destination: 
   type: "assets"
   shared: true
   action: "upsert"
   query: "select 'My Assets Transformation' as name, 'asset1' as externalId"
   # query:
   #   file: query.sql
   schedule: "* * * * *"
   ignoreNullFields: false
   notifications:
    - [email protected]
    - [email protected]

  # Optional, default: null
  # Skipping this field or providing null clears
  # the data set ID on updating the transformation
  dataSetId: 1

  # Or you can provide data set external ID instead,
  # Optional, default: null
  # Skipping this field or providing null clears
  # the data set ID on updating the transformation
  dataSetExternalId: test-dataset

  # Optional: You can tag your transformations with max 5 tags.
  tags:
    - mytag1
    - mytag2

  authentication:
   apiKey: ${API_KEY}

   # # If you need to specity read/write authentication separately
   # authentication:
   #   read:
   #     apiKey: ${API_KEY}
   #   write:
   #     apiKey: ${API_KEY}
  1. To deploy a set of transformations in a GitHub workflow, add a step which references the Action in your job.

Deploy step with OIDC credentials

Alternatively when using OIDC, the Action needs the client details instead of api-key:

- name: Deploy transformations
  uses: cognitedata/transformations-cli@main
  env:
      # Credentials to be used when running your transformations,
      # as referenced in your manifests:
      COGNITE_CLIENT_ID: my-cognite-client-id
      COGNITE_CLIENT_SECRET: ${{ secrets.cognite_client_secret }}
  with:
      # Credentials used for deployment
      path: transformations  # Transformation manifest folder, relative to GITHUB_WORKSPACE
      client-id: my-jetfire-client-id
      client-secret: ${{ secrets.jetfire_client_secret] }}
      token-url: https://login.microsoftonline.com/<my-azure-tenant-id>/oauth2/v2.0/token
      cdf-project-name: my-project-name
      # If you need to provide multiple scopes, the format: "scope1 scope2 scope3"
      scopes: https://<my-cluster>.cognitedata.com/.default
      # audience: "" # Optional

This GitHub Action takes the following inputs:

Name Description
path (Required) The path to a directory containing transformation manifests. This is relative to $GITHUB_WORKSPACE, which will be the root of the repository when using actions/checkout with default settings.
client-id (Required) The CLIENT ID used for authenticating with transformations. Equivalent to setting the TRANSFORMATIONS_CLIENT_ID environment variable.
client-secret (Required) The CLIENT SECRET used for authenticating with transformations. Equivalent to setting the TRANSFORMATIONS_CLIENT_SECRET environment variable.
token-url (Required) The TOKEN URL used for requesting token to authenticate with transformations. Equivalent to setting the TRANSFORMATIONS_TOKEN_URL environment variable.
cdf-project-name (Required) Equivalent to setting the TRANSFORMATIONS_PROJECT environment variable.
scopes (Optional) The SCOPES used for authenticating with transformations. Equivalent to setting the TRANSFORMATIONS_SCOPES environment variable. Space separated if multiple needed.
audience (Optional) The AUDIENCE used for authenticating with transformations. Equivalent to setting the TRANSFORMATIONS_AUDIENCE environment variable.
cluster (Optional) The name of the cluster where Transformations is hosted. Equivalent to setting the TRANSFORMATIONS_CLUSTER environment variable.

Additionally, you must specify environment variables for any credentials or environment variables referenced in transformation manifests.

Deploy step with API keys

- name: Deploy transformations
  uses: cognitedata/transformations-cli@main
  with:
    path: transformations # Transformation manifest folder, relative to GITHUB_WORKSPACE
    api-key: ${{ secrets.TRANSFORMATIONS_API_KEY }}
    # If not using the main europe-west1-1 cluster:
    # cluster: greenfield
    # cdf-project-name: my-project # to suppress python SDK warning (not required for API keys).
  env:
    # API key to be used when running your transformations,
    # As referenced in your transformation manifests
    SOME_API_KEY: ${{ secrets.SOME_API_KEY }}

This GitHub Action takes the following inputs:

Name Description
path (Required) The path to a directory containing transformation manifests. This is relative to $GITHUB_WORKSPACE, which will be the root of the repository when using actions/checkout with default settings.
api-key (Required) The API key used for authenticating with transformations. Equivalent to setting the TRANSFORMATIONS_API_KEY environment variable.
cluster (Optional) The name of the cluster where Transformations is hosted. Equivalent to setting the TRANSFORMATIONS_CLUSTER environment variable.

Additionally, you must specify environment variables for any API keys or environment variables referenced in transformation manifests.