HySDS - ARIA - Azure scripts

Scripts to make your life easier when deploying HySDS and ARIA on Microsoft Azure cloud

⭕️ HySDS Core
📥 Installers (e.g. Puppet, Terraform)
🧩 Cluster Node-specific Packages (e.g. mozart, figaro)
🧰 Operational packages (e.g. hysds-dockerfiles)
🛰 PGE
🛠 Helper Scripts
📖 Documentation
❓ Miscellaneous / Uncategorized

This repository complements the official HySDS repos developed by JPL.

Directories

`terraform/` - Terraform version of the deployment scripts

This directory contains .tf files written in the HashiCorp configuration language (HCL) that describes a HySDS cluster. For more information on how to use it, refer to the README file in the directory

`terraform_old/` - Deprecated Terraform version of the deployment scripts

This version of the Terraform deployment scripts are deprecated.

`shell/` - Standard UNIX shell version of the deployment scripts

NOTE: the shell version is currently deprecated!

The shell script version is written in standard sh shell and provides a semi-automated way of deploying a HySDS cluster. The rationale for making shell scripts instead of a more powerful tool in an interpreted language like Python is twofold: to reduce code needed and to improve cross-platform compatibility. Occasionally, however, Python helpers are written for more advanced functionality such as JSON parsing.

`helpers/` - Helper scripts for the ARIA adaptation

Shell scripts meant to make life easier for the user. Further documentation is provided in another README.md file in that directory.

`docs/` - Documentation

Miscellaneous documentation written in Markdown for resolving common problems you may encounter with the system, as well as operational guides.

Currenty available documentation:

Fix slow Docker images on Factotum

Usage

Please refer to the README files inside the directories for detailed usage during the installation

Post deployment - HySDS provisioning

Further configuration is still required after you run either the Terraform or the shell versions of the deployment scripts. Some of the tasks that you need to do includes:

Set up CI by navigating to http://[CI_FQDN]:8080 and proceed as admin, and retrieve Jenkin's API key and the current administrative user's username. Optionally, you might want to update Jenkins before setting it up by downloading the latest Jenkins .war file and replacing the old version in /usr/local/bin/jenkins.war.
sds configure to set up the environment constants used by HySDS on Mozart, one of which is to add JENKINS_API_KEY as retrieved previously.
Verify that ElasticSearch is running on Mozart, Metrics and GRQ instances by running $ systemctl status elasticsearch on those instances. If it's not up, run # systemctl start elasticsearch. Optionally, you might want to enable ElasticSearch on startup and run # systemctl enable elasticsearch.
sds update all -f to update all components of HySDS.
sds start all -f to start all components of HySDS, and use sds status all to verify that all components are up and running.
(Recommended) Improve Docker performance on Factotum by using this guide.
(Optional) Set up real HTTPS certificates instead of using self-signed ones by running sh /home/ops/https_autoconfig.sh on the servers that need it (mainly Mozart, GRQ and Metrics). The current script only supports DNS verification through CloudFlare, meaning that your DNS nameservers must be Cloudflare before attempting to use this script.
(Optional) Set up the ARIA adaptation using instructions from here.

Post deployment - ARIA adaptation

This section is adapted from hysds/ariamh.

Stop the cluster with $ sds stop all -f
Backup your SDS configuration directory ~/.sds with $ mv ~/.sds ~/.sds.bak
Download the custom SDS configuration on Mozart by running $ cd ~; wget https://bitbucket.org/nvgdevteam/sds-config-aria/downloads/sds-config-aria-azure.tbz2
Unpack the new template for the ARIA adaptation with $ tar xvf sds-config-aria-azure.tbz2; mv sds-config-aria ~/.sds
Restore your original configuration file $ cp ~/.sds.bak/config ~/.sds/
Copy the ARIA adaptation repositories to ~/mozart/ops by running $ cp -rf ~/.sds/repos/* ~/mozart/ops/
Update the cluster with $ sds update all -f
(Optional) Increase the number of workers on Factotum for faster concurrent downloads by modifying ~/.sds/files/supervisord.conf.factotum
Push the ARIA adaptation configuration files with $ fab -f ~/.sds/cluster.py -R factotum,verdi update_aria_packages, or alternatively run the helpers/fab_push_aria_pkgs.sh script
Ship the latest code configuration bundle for autoscaling workers with $ sds ship
Either reset or start the cluster with $ sds reset all -f or $ sds start all -f

Post deployment - Autoscaling workers

Part 1

Part one configures the Verdi image creator VM to be ready for imaging

A Verdi image creator VM has already been set up for you through Terraform with its Puppet module installed. However, that image still requires additional configuration to be turned into an image

Push Azure configuration files from Mozart to Verdi with the following script, replacing [SSH_KEY_NAME] and [VERDI_IP] with the correct strings

#!/bin/bash

export PRIVATE_KEY_NAME=[SSH_KEY_NAME]
export VERDI_IP=[VERDI_IP]

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i "~/.ssh/$PRIVATE_KEY_NAME" ~/.azure/azure_credentials.json ops@$VERDI_IP:~/.azure
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i "~/.ssh/$PRIVATE_KEY_NAME" ~/.azure/config ops@$VERDI_IP:~/.azure

Change VERDI_PVT_IP in Mozart's ~/.sds/config with the correct private IP for the Verdi image creator VM.
Update Verdi by running sds update verdi -f on Mozart
Push ARIA packages to Verdi by running fab -f ~/.sds/cluster.py -R verdi update_aria_packages on Mozart
Install Python packages on Verdi by running source /home/ops/verdi/bin/activate; pip install azure msrest msrestazure ConfigParser on Verdi
Ship the Verdi configuration with sds ship on Mozart
Deprovision the Verdi instance with sudo waagent -deprovision -force. This command can only be run ONCE and there is no going back!

Part 2

Part 2 creates the image and the scale set. Run these commands on a machine with Azure CLI with an account

Deallocate the VM with az vm deallocate --resource-group [AZURE_RESOURCE_GROUP] --name [VERDI_VM_NAME]
Generalize the VM with az vm generalize --resource-group [AZURE_RESOURCE_GROUP] --name [VERDI_VM_NAME]
Create the VM image with az image create --resource-group [AZURE_RESOURCE_GROUP] --name "HySDS_Verdi_YYYY-MM-DD-rcX" --source [VERDI_VM_NAME], with the format of the image name being in year, month, day and release candidate number
Create a file called bundleurl.txt by performing echo BUNDLE_URL=azure://[AZURE_STORAGE_ACCOUNT_NAME].blob.core.windows.net/code/aria-ops.tbz2 > bundleurl.txt, replacing [AZURE_STORAGE_ACCOUNT_NAME] with the name of your storage account

Create the scale set with

az vmss create --custom-data bundleurl.txt --location southeastasia --name [VMSS_NAME] --vm-sku Standard_F32s_v2 --admin-username ops --instance-count 0 --single-placement-group true --lb-sku standard --priority low --authentication-type ssh --ssh-key-value "[YOUR_SSH_KEY_VALUE]" --vnet-name [AZURE_VNET] --subnet [SUBNET_NAME] --image [VERDI_IMAGE_NAME] --resource-group [AZURE_RESOURCE_GROUP] --public-ip-per-vm --nsg [NSG_NAME] --eviction-policy delete

Proceed to configure the scaling rules on the Azure Portal

If you ever need to reconfigure an existing scale set with a different image, use the following script:

AZ_RESOURCE_GROUP=[AZURE_RESOURCE_GROUP]
IMAGE=[VERDI_IMAGE_NAME]
VMSS=[VMSS_NAME]
SUBSCRIPTION_ID=[AZURE_SUBSCRIPTION_ID]

az vmss update --resource-group $AZ_RESOURCE_GROUP --name $VMSS --set virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$AZ_RESOURCE_GROUP/providers/Microsoft.Compute/images/$IMAGE

Post deployment - Manual data download and orbital data scraping

This section is adapted from hysds/ariamh.

Install all necessary HySDS packages (in sdspkg.tar format), keep in mind that these packages are not YET publicly distributed, but are proven to work with Azure. Some of them are modified, such as ariamh
- container-hysds-org_lightweight-jobs:release-20180419 - Basic meta-jobs such as retry, revoke, etc.
- container-aria-hysds_qquery:release-20180612 - Performs qquery to check which SLCs to download and automatically runs the sling jobs
- container-hysds-org_create_aoi:release-20180306 - Creates an AOI of the region
- container-aria-hysds_scihub_acquisition_scraper:release-20180604 - Scrapes data from SciHub or ASF
- container-aria-hysds_s1_qc_ingest:release-20180627 - Ingests Sentinel 1 data
- container-hysds-org_spyddder-man:release-20180129 - Data discovery, download and extraction. Contains the sling jobs (basically jobs for downloading SLCs)
- container-aria-hysds_ariamh:release-20180327 - The ARIA master package
- If you wish to build your own packages:
  - sds ci add_job <Github HTTPS link> azure $UID $(id -g)
  - -b <branch> - To build from a specific branch
  - -k - To clone with Git token specified in ~/.sds/config
Modify job queues to accelerate certain jobs on Factotum with ~/.sds/files/supervisord.conf.factotum. Recommended parameters are:
- factotum-job_worker-large: 4
- factotum-job_worker-asf_throttled: 8
Run acquisition scraper manually by running the helpers/1_scrape_scihub.sh script. Make sure that the version number constant is correct
If acquisition scraper is okay, define an AOI using the Tosca web interface and submit a qquery job with helpers/2_qquery.sh
Wait for at least one sling job to complete, and run cd ~/.sds/rules; sds rules import user_rules.json to import rules to automatically extract acquisition data. Keep in mind that the extract occurs on Verdi workers. If you don't want this, go to Tosca, click "User Rules" on the top right corner and change the worker type to something like factotum-large
After all the slings and extract jobs are completed, you can move on to scraping orbit and calibration files with helpers/3_scrape_s1_orbit.sh
You are now ready to create interferograms!

Tips / Caveats

Make sure that Azure is properly configured with Microsoft.Network, Microsoft.Compute and Microsoft.Storage resource providers all enabled
Triple check the environment variables defined in the envvars.sh file before deploying, such as the name of the resource group, the names of the resources to be created etc.
The scripts are not meant to be run multiple times. It is best to run them only once
Always perform all tasks on a strong and reliable connection to avoid any potential failures requiring the scripts to be restarted
The user is required to manually copy and paste certain values from their system or display into the script when necessary. This is to reduce the complexity of the script. Certain more advanced features like JSON parsing is done via piping data into Python
You may need to switch account subscriptions within your Azure CLI tool using the command azure account set -s [subscription ID]
These scripts use Python's built-in json library to process JSON emitted by the az tool. The script automatically detects the Python version present on the system
All SSH/SCP commands are run with -o StrictHostKeyChecking=no to skip having to answer yes when connecting to the VMs for the first time

Known issues

Phase 2 fails with `No module named 'azure.mgmt.compute'` during creation of VM on macOS

This is caused by a faulty version of the az tool (namely 2.0.47) installed by Homebrew (if you used Homebrew to install the package). You have to downgrade the tool to 2.0.46 manually with the following commands. This assumes you already have the az tool installed.

$ brew unlink azure-cli

$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/3894a0d2095f48136ce1af5ebd5ba42dd38f1dac/Formula/azure-cli.rb

`yum -y update` hangs during Phase 2

Symptoms: During the provisioning of the base VM in phase 2, a yum -y update command causes the base VM to be unresponsive. There may also be a ssh_exchange_identification: read: Connection reset by peer message returned if one attempts to SSH into the VM.

Reason: The yum -y update command is CPU intensive, and for some reason, the deprovisioning commands are run in parallel with the package updater, causing the user to be unable to SSH back into the machine.

Solution: The yum -y update command has been moved from phase 2 to phase 4. The deprovisioning commands are explicitly run on another SSH session right after the installation of necessary yum packages.

Contributing

If you encounter any issues with the script, please do not hesitate to open an issue on this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HySDS - ARIA - Azure scripts

Directories

`terraform/` - Terraform version of the deployment scripts

`terraform_old/` - Deprecated Terraform version of the deployment scripts

`shell/` - Standard UNIX shell version of the deployment scripts

`helpers/` - Helper scripts for the ARIA adaptation

`docs/` - Documentation

Usage

Post deployment - HySDS provisioning

Post deployment - ARIA adaptation

Post deployment - Autoscaling workers

Part 1

Part 2

Post deployment - Manual data download and orbital data scraping

Tips / Caveats

Known issues

Phase 2 fails with `No module named 'azure.mgmt.compute'` during creation of VM on macOS

`yum -y update` hangs during Phase 2

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
docs		docs
helpers		helpers
shell		shell
terraform		terraform
terraform_old		terraform_old
.gitignore		.gitignore
README.md		README.md

earthobservatory/azure-scripts

Folders and files

Latest commit

History

Repository files navigation

HySDS - ARIA - Azure scripts

Directories

terraform/ - Terraform version of the deployment scripts

terraform_old/ - Deprecated Terraform version of the deployment scripts

shell/ - Standard UNIX shell version of the deployment scripts

helpers/ - Helper scripts for the ARIA adaptation

docs/ - Documentation

Usage

Post deployment - HySDS provisioning

Post deployment - ARIA adaptation

Post deployment - Autoscaling workers

Part 1

Part 2

Post deployment - Manual data download and orbital data scraping

Tips / Caveats

Known issues

Phase 2 fails with No module named 'azure.mgmt.compute' during creation of VM on macOS

yum -y update hangs during Phase 2

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`terraform/` - Terraform version of the deployment scripts

`terraform_old/` - Deprecated Terraform version of the deployment scripts

`shell/` - Standard UNIX shell version of the deployment scripts

`helpers/` - Helper scripts for the ARIA adaptation

`docs/` - Documentation

Phase 2 fails with `No module named 'azure.mgmt.compute'` during creation of VM on macOS

`yum -y update` hangs during Phase 2

Packages