Scripts to make your life easier when deploying HySDS and ARIA on Microsoft Azure cloud
- ⭕️ HySDS Core
- 📥 Installers (e.g. Puppet, Terraform)
- 🧩 Cluster Node-specific Packages (e.g.
mozart
,figaro
) - 🧰 Operational packages (e.g.
hysds-dockerfiles
) - 🛰 PGE
- 🛠 Helper Scripts
- 📖 Documentation
- ❓ Miscellaneous / Uncategorized
This repository complements the official HySDS repos developed by JPL.
This directory contains .tf
files written in the HashiCorp configuration language (HCL) that describes a HySDS cluster. For more information on how to use it, refer to the README file in the directory
This version of the Terraform deployment scripts are deprecated.
NOTE: the shell version is currently deprecated!
The shell script version is written in standard sh
shell and provides a semi-automated way of deploying a HySDS cluster. The rationale for making shell scripts instead of a more powerful tool in an interpreted language like Python is twofold: to reduce code needed and to improve cross-platform compatibility. Occasionally, however, Python helpers are written for more advanced functionality such as JSON parsing.
Shell scripts meant to make life easier for the user. Further documentation is provided in another README.md file in that directory.
Miscellaneous documentation written in Markdown for resolving common problems you may encounter with the system, as well as operational guides.
Currenty available documentation:
Please refer to the README files inside the directories for detailed usage during the installation
Further configuration is still required after you run either the Terraform or the shell versions of the deployment scripts. Some of the tasks that you need to do includes:
- Set up CI by navigating to
http://[CI_FQDN]:8080
and proceed as admin, and retrieve Jenkin's API key and the current administrative user's username. Optionally, you might want to update Jenkins before setting it up by downloading the latest Jenkins.war
file and replacing the old version in/usr/local/bin/jenkins.war
. sds configure
to set up the environment constants used by HySDS on Mozart, one of which is to addJENKINS_API_KEY
as retrieved previously.- Verify that ElasticSearch is running on Mozart, Metrics and GRQ instances by running
$ systemctl status elasticsearch
on those instances. If it's not up, run# systemctl start elasticsearch
. Optionally, you might want to enable ElasticSearch on startup and run# systemctl enable elasticsearch
. sds update all -f
to update all components of HySDS.sds start all -f
to start all components of HySDS, and usesds status all
to verify that all components are up and running.- (Recommended) Improve Docker performance on Factotum by using this guide.
- (Optional) Set up real HTTPS certificates instead of using self-signed ones by running
sh /home/ops/https_autoconfig.sh
on the servers that need it (mainly Mozart, GRQ and Metrics). The current script only supports DNS verification through CloudFlare, meaning that your DNS nameservers must be Cloudflare before attempting to use this script. - (Optional) Set up the ARIA adaptation using instructions from here.
This section is adapted from hysds/ariamh.
- Stop the cluster with
$ sds stop all -f
- Backup your SDS configuration directory
~/.sds
with$ mv ~/.sds ~/.sds.bak
- Download the custom SDS configuration on Mozart by running
$ cd ~; wget https://bitbucket.org/nvgdevteam/sds-config-aria/downloads/sds-config-aria-azure.tbz2
- Unpack the new template for the ARIA adaptation with
$ tar xvf sds-config-aria-azure.tbz2; mv sds-config-aria ~/.sds
- Restore your original configuration file
$ cp ~/.sds.bak/config ~/.sds/
- Copy the ARIA adaptation repositories to
~/mozart/ops
by running$ cp -rf ~/.sds/repos/* ~/mozart/ops/
- Update the cluster with
$ sds update all -f
- (Optional) Increase the number of workers on Factotum for faster concurrent downloads by modifying
~/.sds/files/supervisord.conf.factotum
- Push the ARIA adaptation configuration files with
$ fab -f ~/.sds/cluster.py -R factotum,verdi update_aria_packages
, or alternatively run thehelpers/fab_push_aria_pkgs.sh
script - Ship the latest code configuration bundle for autoscaling workers with
$ sds ship
- Either reset or start the cluster with
$ sds reset all -f
or$ sds start all -f
Part one configures the Verdi image creator VM to be ready for imaging
-
A Verdi image creator VM has already been set up for you through Terraform with its Puppet module installed. However, that image still requires additional configuration to be turned into an image
-
Push Azure configuration files from Mozart to Verdi with the following script, replacing
[SSH_KEY_NAME]
and[VERDI_IP]
with the correct strings#!/bin/bash export PRIVATE_KEY_NAME=[SSH_KEY_NAME] export VERDI_IP=[VERDI_IP] scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i "~/.ssh/$PRIVATE_KEY_NAME" ~/.azure/azure_credentials.json ops@$VERDI_IP:~/.azure scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i "~/.ssh/$PRIVATE_KEY_NAME" ~/.azure/config ops@$VERDI_IP:~/.azure
-
Change
VERDI_PVT_IP
in Mozart's~/.sds/config
with the correct private IP for the Verdi image creator VM. -
Update Verdi by running
sds update verdi -f
on Mozart -
Push ARIA packages to Verdi by running
fab -f ~/.sds/cluster.py -R verdi update_aria_packages
on Mozart -
Install Python packages on Verdi by running
source /home/ops/verdi/bin/activate; pip install azure msrest msrestazure ConfigParser
on Verdi -
Ship the Verdi configuration with
sds ship
on Mozart -
Deprovision the Verdi instance with
sudo waagent -deprovision -force
. This command can only be run ONCE and there is no going back!
Part 2 creates the image and the scale set. Run these commands on a machine with Azure CLI with an account
-
Deallocate the VM with
az vm deallocate --resource-group [AZURE_RESOURCE_GROUP] --name [VERDI_VM_NAME]
-
Generalize the VM with
az vm generalize --resource-group [AZURE_RESOURCE_GROUP] --name [VERDI_VM_NAME]
-
Create the VM image with
az image create --resource-group [AZURE_RESOURCE_GROUP] --name "HySDS_Verdi_YYYY-MM-DD-rcX" --source [VERDI_VM_NAME]
, with the format of the image name being in year, month, day and release candidate number -
Create a file called
bundleurl.txt
by performingecho BUNDLE_URL=azure://[AZURE_STORAGE_ACCOUNT_NAME].blob.core.windows.net/code/aria-ops.tbz2 > bundleurl.txt
, replacing [AZURE_STORAGE_ACCOUNT_NAME] with the name of your storage account -
Create the scale set with
az vmss create --custom-data bundleurl.txt --location southeastasia --name [VMSS_NAME] --vm-sku Standard_F32s_v2 --admin-username ops --instance-count 0 --single-placement-group true --lb-sku standard --priority low --authentication-type ssh --ssh-key-value "[YOUR_SSH_KEY_VALUE]" --vnet-name [AZURE_VNET] --subnet [SUBNET_NAME] --image [VERDI_IMAGE_NAME] --resource-group [AZURE_RESOURCE_GROUP] --public-ip-per-vm --nsg [NSG_NAME] --eviction-policy delete
-
Proceed to configure the scaling rules on the Azure Portal
-
If you ever need to reconfigure an existing scale set with a different image, use the following script:
AZ_RESOURCE_GROUP=[AZURE_RESOURCE_GROUP] IMAGE=[VERDI_IMAGE_NAME] VMSS=[VMSS_NAME] SUBSCRIPTION_ID=[AZURE_SUBSCRIPTION_ID] az vmss update --resource-group $AZ_RESOURCE_GROUP --name $VMSS --set virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$AZ_RESOURCE_GROUP/providers/Microsoft.Compute/images/$IMAGE
This section is adapted from hysds/ariamh.
- Install all necessary HySDS packages (in
sdspkg.tar
format), keep in mind that these packages are not YET publicly distributed, but are proven to work with Azure. Some of them are modified, such asariamh
container-hysds-org_lightweight-jobs:release-20180419
- Basic meta-jobs such as retry, revoke, etc.container-aria-hysds_qquery:release-20180612
- Performs qquery to check which SLCs to download and automatically runs the sling jobscontainer-hysds-org_create_aoi:release-20180306
- Creates an AOI of the regioncontainer-aria-hysds_scihub_acquisition_scraper:release-20180604
- Scrapes data from SciHub or ASFcontainer-aria-hysds_s1_qc_ingest:release-20180627
- Ingests Sentinel 1 datacontainer-hysds-org_spyddder-man:release-20180129
- Data discovery, download and extraction. Contains the sling jobs (basically jobs for downloading SLCs)container-aria-hysds_ariamh:release-20180327
- The ARIA master package- If you wish to build your own packages:
sds ci add_job <Github HTTPS link> azure $UID $(id -g)
-b <branch>
- To build from a specific branch-k
- To clone with Git token specified in~/.sds/config
- Modify job queues to accelerate certain jobs on Factotum with
~/.sds/files/supervisord.conf.factotum
. Recommended parameters are:factotum-job_worker-large
: 4factotum-job_worker-asf_throttled
: 8
- Run acquisition scraper manually by running the
helpers/1_scrape_scihub.sh
script. Make sure that the version number constant is correct - If acquisition scraper is okay, define an AOI using the Tosca web interface and submit a
qquery
job withhelpers/2_qquery.sh
- Wait for at least one sling job to complete, and run
cd ~/.sds/rules; sds rules import user_rules.json
to import rules to automatically extract acquisition data. Keep in mind that the extract occurs on Verdi workers. If you don't want this, go to Tosca, click "User Rules" on the top right corner and change the worker type to something likefactotum-large
- After all the slings and extract jobs are completed, you can move on to scraping orbit and calibration files with
helpers/3_scrape_s1_orbit.sh
- You are now ready to create interferograms!
- Make sure that Azure is properly configured with
Microsoft.Network
,Microsoft.Compute
andMicrosoft.Storage
resource providers all enabled - Triple check the environment variables defined in the
envvars.sh
file before deploying, such as the name of the resource group, the names of the resources to be created etc. - The scripts are not meant to be run multiple times. It is best to run them only once
- Always perform all tasks on a strong and reliable connection to avoid any potential failures requiring the scripts to be restarted
- The user is required to manually copy and paste certain values from their system or display into the script when necessary. This is to reduce the complexity of the script. Certain more advanced features like JSON parsing is done via piping data into Python
- You may need to switch account subscriptions within your Azure CLI tool using the command
azure account set -s [subscription ID]
- These scripts use Python's built-in
json
library to process JSON emitted by theaz
tool. The script automatically detects the Python version present on the system - All SSH/SCP commands are run with
-o StrictHostKeyChecking=no
to skip having to answeryes
when connecting to the VMs for the first time
This is caused by a faulty version of the az
tool (namely 2.0.47) installed by Homebrew (if you used Homebrew to install the package). You have to downgrade the tool to 2.0.46 manually with the following commands. This assumes you already have the az
tool installed.
$ brew unlink azure-cli
$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/3894a0d2095f48136ce1af5ebd5ba42dd38f1dac/Formula/azure-cli.rb
Symptoms: During the provisioning of the base VM in phase 2, a yum -y update
command causes the base VM to be unresponsive. There may also be a ssh_exchange_identification: read: Connection reset by peer
message returned if one attempts to SSH into the VM.
Reason: The yum -y update
command is CPU intensive, and for some reason, the deprovisioning commands are run in parallel with the package updater, causing the user to be unable to SSH back into the machine.
Solution: The yum -y update
command has been moved from phase 2 to phase 4. The deprovisioning commands are explicitly run on another SSH session right after the installation of necessary yum packages.
If you encounter any issues with the script, please do not hesitate to open an issue on this repo.