An intentional plantation of trees or shrubs that is maintained for food production.
Orchard is an orchestration service that manages data pipelines, compute workflows, associated ETL/ELT activities, AND manages the underlying resource lifecycle (provisioning, monitoring, termination). Inspired by AWS' Data Pipeline service, Orchard is designed for enterprise use-cases that demand security, extreme concurrency, granular control over the resource lifecycle, and flexible integration with a cloud-based microservice architecture.
Like Apache Spark, Orchard is written in functional Scala. This gives Orchard the power of Scala's well-developed concurrency features, and in particular, the Actor pattern as enabled by Scala's Pekko library.
Orchard is designed to be deployed into a cloud environment as a service, but can alternatively be set up locally for exploration and development. To do so, follow these steps.
Install Apps (OSX)
# clone orchard into a local directory
git clone [email protected]:salesforce/orchard.git
# use sdkman to install Scala Build Tool (SBT) (if needed)
# you can also read https://sdkman.io/usage for a more personalized sdk
# experience
curl -s "https://get.sdkman.io" | bash
sdk install sbt
# if you don't have the right java installed, run the following command
sdk env install
# switch to use the correct java version defined in .sdkmanrc
sdk env
# use brew to install postman (for API calls) and docker (if needed)
brew install --cask postman
brew install --cask docker
Configure Postgres Database
Orchard uses a Postgres database in the docker-compose stack to store the state of each active task. Set the password for this database by adding a .env file to the project's root containing ORCHARD_PG_SECRET=orchardsecret
, substituting orchardsecret
for your own secret.
or set directly in the environment with:
export ORCHARD_PG_SECRET=orchardsecret
Start the Docker Compose stack
docker-compose up
This will start the database container, provision the required tables, and start the Orchard web-serivce.
Authentication
Orchard is by default running a development configuration where authentication is disabled. To enable API authentication, set orchard.auth.enabled = true
in application.conf. Orchard will then pull the keys specified in
hashed-keys = {
user = [ ${?MCE_ENV_X_API_USER1} , ${?MCE_ENV_X_API_USER2} ]
admin = [ ${?MCE_ENV_X_API_ADMIN1} , ${?MCE_ENV_X_API_ADMIN2} ]
}
which must match the key provided in the header of any inbound API requests.
Once the setup is complete, Orchard is ready to receive a number of different instructions via API request.
If deployed into a cloud environment like AWS, Orchard will need a role with an appropriate set of permissions appropriate for the activities.
Orchard allows the definition and execution of workflows, where each workflow consists of a number of activities. Activities can be dependent on other activities, forming a directed acyclic graph (DAG). Orchard will execute activities concurrently whenever possible.
You can generate an example workflow which will execute a number of activities in an AWS VPC environment with the following command:
cd example/data
mustache sample_workflow_view.json sample_workflow.json.mustache > sample_workflow.json
We used mustache to substitute values specific to a given AWS account into the final payload. You can install mustache
with npm install -g mustache
.
For example, you can create a file example/data/sample_workflow_view.json
to define subnetId
, s3bucket
, etc. Change these to match your specific needs:
{
"resources": {
"subnetId": "subnet-xxxxxxxx"
},
"s3bucket": "my-bucket-name",
"sparkConfig": {
"env_key": "REGION_ENV_KEY",
"env_val": "uswest-cloud-trust"
},
"snsArn": "your-sns-arn-for-action"
}
sample_workflow.json.mustache contains a mustache template which can accept these substitutions.
Once generated, your sample_workflow.json
can be used to create a workflow:
POST http://localhost:9001/v1/workflow
OR
curl -X POST \
-H "Content-type: application/json" \
-d "$(jq -c . < path/to/sample_workflow.json)" \
"http://localhost:9001/v1/workflow"
Which returns a workflow_id. For example: wf-f231a08f-60e4-480a-b845-e53e06918f77
Once defined, activate a workflow using the workflow id like so:
PUT http://localhost:9001/v1/workflow/wf-f231a08f-60e4-480a-b845-e53e06918f77
OR
curl -X PUT \
-H "Content-type: application/json" \
"http://localhost:9001/v1/workflow/wf-f231a08f-60e4-480a-b845-e53e06918f77/activate"
Resource and Activity Types
In the above example workflow, the activities and resources used are stubs. In an actual deployment, Orchard will be using resources and activities specific to the chosen cloud provider's environment, like AWS' EC2 or EMI. Each activity has its own activitySpec
, which contains configuration needed to carry out that activity.
Currently, Orchard supports:
- AWS EC2 activities / resources
- AWS EMR activities / resources
- AWS S3 resources
- AWS SSM resources
- Shell script activity
- Shell command activity
The project is actively seeking contributions for other activity and resource types, including those relevant to GCP and Azure cloud. A guide to adding new resources and activities will be linked here at a later date for those interested in contributing.
To contribute to the project, please check issues, fork, and submit a pull request.
Orchard is an open-source project licensed under BSD 3-Clause "New" or "Revised" License.
Go here to read the full text of Orchard's license.