-
Notifications
You must be signed in to change notification settings - Fork 3
Installing Software
A computing cluster is of limited value, without software. Being a shared environment, multiple versions of various applications may be required. In addition, users may need access to older software or library releases to rerun an analysis. To accommodate these needs, we'll utilize Environment Modules, and Conda Environments. One side benefit to these tools, is their ability to enable both admins and users to manage the software environment. In other words, both users and admins can create and load software environments.
As the name implies, an Environment Modules configures the environment variables such as PATH, JAVA_HOME, etc. for a specific software package. Multiple modules can be loaded at the same time, enabling access to multiple applications for use in a pipeline, script, etc.
Admins should install software in the cluster-wide shared location /share/apps/. For organizational purposes, software should be grouped by vendor, or other logical grouping. For example, R and RStudio would be installed within /share/apps/R/.
As for the actual install process, all software should be installed via the head node. For precompiled applications, simply unzip the application into a version specific directory. For example, /share/apps/HISAT2/hisat2-2.1.0/. For applications that must compiled, you will need to set a version specific install prefix in the configure script or Makefile. If the compiled software requires any additional libraries provided by another Environment Module, be sure to load the Environment Module during compiling, and make note of it.
Once the application or library has been installed, you need to prepare a modulefile for it. The modulefile specifies what environment modifications are necessary to use the application. It can also contain additional information such as software description, dependencies, and help. Like applications, the modulefiles should be organized, and given version specific names. For example, the modulefile for R-3.5.3, is located at /share/apps/modulefiles/R/R-3.5.3.
For many applications, modulefiles are fairly simple. In most cases, they simply need to update the PATH variable. However, some applications may include MAN pages, which can be added to the MANPATH. An example, of such a modulefile is below:
#%Module 1.0
#
# R-3.5.3
#
prepend-path PATH /share/apps/R/R-3.5.3/bin
prepend-path MANPATH /share/apps/R/R-3.5.3/share/man/
Some software may need to add variables to the environment, rather than simply modify paths. For example, the Java Development Kit needs to create the JAVA_HOME variable. Example below:
#%Module 1.0
#
# JDK-8.0.202
#
prepend-path PATH /share/apps/Java/zulu8.36.0.1-ca-jdk8.0.202-linux_x64/bin
prepend-path MANPATH /share/apps/Java/zulu8.36.0.1-ca-jdk8.0.202-linux_x64/share/man/
setenv JAVA_HOME /share/apps/Java/zulu8.36.0.1-ca-jdk8.0.202-linux_x64/
The install process is the same as for an admin. The difference is where software should be installed. Each group has a shared directory located at /share/groups/GROUP/apps for storing software. Within that directory is a modulefiles sub-directory, for storing your application's Environment Modules.
Conda Environments are created using the conda utility included with Anaconda and Miniconda Python distributions. However, the environments aren't limited to Python, they can contain other languages (such as R), and other software. One major difference between Conda and Environment Modules, is the software installation process. The conda utility is also a package manger, enabling you to install software from various Conda compatible software repositories. As a package manager, conda will resolve any software dependencies for you. Thus reducing the steps involved in installing complex software packages, such as QIIME.
While a conda environment can support multiple applications, ideally the environment should be dedicated to one application or software bundle. This will ensure that you don't run into dependency issues, where one application may require a different library or package version. As with Environment Modules, the environment should be named after the application and version installed within it. That way, you can switch versions at ease.
As with other software, cluster-wide conda environments should be installed in /share/apps/. They should be grouped by vendor or other logical grouping, and named after the specific version within them. For example, if you wanted to install Sumaclust 1.0.31, you could place it in /share/apps/Sumaclust/sumaclust-1.0.31. See below for an example:
-
Use conda to create the sumaclust-1.0.31 environment
/share/apps/Anaconda/Miniconda3-4.5.12/bin/conda create --prefix /share/apps/Sumaclust/sumaclust-1.0.31
-
Activate the sumaclust-1.0.31 environment, enabling you to run commands within it
source /share/apps/Anaconda/Miniconda3-4.5.12/bin/activate /share/apps/Sumaclust/sumaclust-1.0.31
-
Install sumaclust from the Bioconda repository
conda install -c bioconda sumaclust
-
Finally, deactivate (exit) the conda environment
source /share/apps/Anaconda/Miniconda3-4.5.12/bin/deactivate
Creating a conda environment as a user is the same process as an admin. The only difference between them, is the path the environment is stored in. For users, the conda environment will be located within your home directory. We recommend storing them together in a logical location, such as /home/youruser/conda_env/my-app-environment.
The install process is the same as for an admin. The difference is where the conda environment is stored. Each group has a shared directory located at /share/groups/GROUP/apps for storing software and conda environments.
Environment Modules shouldn't be used to load a conda environment. The conda utility makes many changes to the environment, and shell which can't be easily reproduced using an Environment Module. As an alternative, a modulefile should be created to advertise the existence of the conda environment, and provide help in activating it. For example:
#%Module 1.0
#
# fishtaco-1.1.1
#
proc ModulesHelp { } {
puts stderr "\tActivate Anaconda or Miniconda before using FishTaco"
puts stderr ""
puts stderr "\t\tmodule load Anaconda/Miniconda3-4.5.12"
puts stderr ""
puts stderr "\tActivate the FishTaco conda environment with:"
puts stderr ""
puts stderr "\t\tsource activate /share/apps/FishTaco/fishtaco-1.1.1"
puts stderr ""
puts stderr "\tDeactivate FishTaco before using other Python environments:"
puts stderr "\t\tsource deactivate"
puts stderr "\t\tmodule unload FishTaco/fishtaco-1.1.1"
}
ModulesHelp
Singularity enables the use of container based software distribution, including the ability to run Docker containers. With container technology, you can package scripts, applications, and their dependencies into a single read only file. For more information, see the Singularity user guide at https://sylabs.io/guides/3.7/user-guide/.