Skip to content

Releases: aws/aws-parallelcluster-cookbook

AWS ParallelCluster v2.11.2

26 Aug 17:02
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.2

This is associated with AWS ParallelCluster v2.11.2

2.11.2

CHANGES

  • When using a custom AMI with a preinstalled EFA package, no actions are taken at node bootstrap time in case GPUDirect RDMA is enabled. The original EFA package deployment is preserved as during the createami process.
  • Upgrade EFA installer to version 1.13.0
    • Update rdma-core to v35.0.
    • Update libfabric to v1.13.0amzn1.0.

BUG FIXES

  • Lock the version of nvidia-fabricmanager package to the installed NVIDIA drivers to prevent updates and misalignments.

AWS ParallelCluster v2.11.1

23 Jul 23:52
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.1

This is associated with AWS ParallelCluster v2.11.1

ENHANCEMENTS

  • Retry failed installations of aws-parallelcluster package on head node of clusters using AWS Batch as the scheduler.

CHANGES

  • Restore noatime option, which has positive impact on the performances of NFS filesystem.

BUG FIXES

  • Pin to version 1.247347 of the CloudWatch agent due to performance impact of latest CW agent version 1.247348.
  • Avoid failures when building SGE using instance type with vCPU >=32.

AWS ParallelCluster v2.11.0

01 Jul 04:01
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.0

This is associated with AWS ParallelCluster v2.11.0

ENHANCEMENTS

  • Add support for Ubuntu 20.04.
  • Add support for using FSx Lustre in subnet with no internet access.
  • Add support for building custom Centos 7 AMIs on ARM.
  • Make sure slurmd service is only enabled after post-install process, which will prevent user from unintentionally making compute node available during post-install process.
  • Change to ssh_target_checker.sh syntax that makes the script compatible with pdsh.
  • Add possibility to use a post installation script when building Centos 8 AMI.
  • Install SSM agent on CentOS 7 and 8.
  • Transition from IMDSv1 to IMDSv2.
  • Add support for security_group_id in packer custom builders. Customers can export AWS_SECURITY_GROUP_ID environment variable to specify security group for custom builders when building custom AMIs.
  • Configure the following default gc_thresh values for performance at scale.
    • net.ipv4.neigh.default.gc_thresh1 = 0
    • net.ipv4.neigh.default.gc_thresh2 = 15360
    • net.ipv4.neigh.default.gc_thresh3 = 16384

CHANGES

  • Ubuntu 16.04 is no longer supported.
  • Amazon Linux is no longer supported.
  • Upgrade EFA installer to version 1.12.2
    • EFA configuration: efa-config-1.8-1 (from efa-config-1.7)
    • EFA profile: efa-profile-1.5-1 (from efa-profile-1.4)
    • EFA kernel module: efa-1.12.3 (from efa-1.10.2)
    • RDMA core: rdma-core-32.1amzn (from rdma-core-31.2amzn)
    • Libfabric: libfabric-1.11.2amzon1.1-1 (from libfabric-1.11.1amzn1.0)
    • Open MPI: openmpi40-aws-4.1.1-2 (from openmpi40-aws-4.1.0)
  • Increase timeout when attaching EBS volumes from 3 to 5 minutes.
  • Retry berkshelf installation up to 3 times.
  • Root volume size increased from 25GB to 35GB on all AMIs. Minimum root volume size is now 35GB.
  • Upgrade Slurm to version 20.11.7.
    • Update slurmctld and slurmd systemd unit files according to latest provided by slurm
    • Add new SlurmctldParameters, power_save_min_interval=30, so power actions will be processed every 30 seconds
    • Specify instance GPU model as GRES GPU Type in gres.conf, instead of previous hardcoded value Type=tesla for all GPU
  • Upgrade Arm Performance Libraries (APL) to version 21.0.0
  • Upgrade NICE DCV to version 2021.1-10557.
  • Upgrade NVIDIA driver to version 460.73.01.
  • Upgrade CUDA library to version 11.3.0.
  • Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-460.
  • Install ParallelCluster AWSBatch CLI in dedicated python3 virtual env.
  • Upgrade Python version used in ParallelCluster virtualenvs from version 3.6.13 to version 3.7.10.
  • Upgrade Cinc Client to version 16.13.16.
  • Upgrade third-party cookbook dependencies:
    • apt-7.4.0 (from apt-7.3.0)
    • iptables-8.0.0 (from iptables-7.1.0)
    • line-4.0.1 (from line-2.9.0)
    • openssh-2.9.1 (from openssh-2.8.1)
    • pyenv-3.4.2 (from pyenv-3.1.1)
    • selinux-3.1.1 (from selinux-2.1.1)
    • ulimit-1.1.1 (from ulimit-1.0.0)
    • yum-6.1.1 (from yum-5.1.0)
    • yum-epel-4.1.2 (from yum-epel-3.3.0)
  • Drop lightdm package install from Ubuntu 18.04 DCV installation process.
  • Update default NFS options used by Compute nodes to mount shared filesystem from head node.
    • Drop intr option, which is deprecated since kernel 2.6.25
    • Drop noatime option, which is not relevant for NFS mount

AWS ParallelCluster v2.10.4

15 May 17:05
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.4

This is associated with AWS ParallelCluster v2.10.4

CHANGES

  • Upgrade Slurm to version 20.02.7

AWS ParallelCluster v2.10.3

18 Mar 22:05
1f793d8
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.3

This is associated with AWS ParallelCluster v2.10.3

CHANGES

  • Upgrade EFA installer to version 1.11.2
    • EFA configuration: efa-config-1.7 (no change)
    • EFA profile: efa-profile-1.4 (from efa-profile-1.3)
    • EFA kernel module: efa-1.10.2 (no change)
    • RDMA core: rdma-core-31.2amzn (no change)
    • Libfabric: libfabric-1.11.1amzn1.0 (no change)
    • Open MPI: openmpi40-aws-4.1.0 (no change)

AWS ParallelCluster v2.10.2

02 Mar 16:32
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.2

This is associated with AWS ParallelCluster v2.10.2

ENHANCEMENTS

  • Improve configuration procedure for the Munge service.

CHANGES

  • Update Python version used in ParallelCluster virtualenvs from version 3.6.9 to version 3.6.13.

BUG FIXES

  • Use non interactive apt update command when building custom Ubuntu AMIs.
  • Fix encrypted_ephemeral = true when using Alinux2 or CentOS8

AWS ParallelCluster v2.10.1

22 Dec 23:15
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.1

This is associated with AWS ParallelCluster v2.10.1

ENHANCEMENTS

  • Install Arm Performance Libraries (APL) 20.2.1 on ARM AMIs (CentOS8, Alinux2, Ubuntu1804).
  • Install EFA kernel module on ARM instances with alinux2 and ubuntu1804.
  • Configure NFS threads to be max(8, num_cores) for performance. This enhancement will not take effect on Ubuntu 16.04.

CHANGES

  • Upgrade EFA installer to version 1.11.1.
    • EFA configuration: efa-config-1.7 (from efa-config-1.5)
    • EFA profile: efa-profile-1.3 (from efa-profile-1.1)
    • EFA kernel module: efa-1.10.2 (no change)
    • RDMA core: rdma-core-31.2amzn (from rdma-core-31.amzn0)
    • Libfabric: libfabric-1.11.1amzn1.0 (from libfabric-1.11.1amzn1.1)
    • Open MPI: openmpi40-aws-4.1.0 (from openmpi40-aws-4.0.5)
  • Upgrade Intel MPI to version U8.
  • Upgrade NICE DCV to version 2020.2-9662.
  • Set default systemd runlevel to multi-user.target on all OSes during ParallelCluster official ami creation.
    The runlevel is set to graphical.target on head node only when DCV is enabled. This prevents the execution of
    graphical services, such as x/gdm, when they are not required.
  • Download Intel MPI and HPC packages from S3 rather than Intel yum repos.

BUG FIXES

  • Fix installation of Intel PSXE package on CentOS 7 by using yum4.
  • Fix routing issues with multiple Network Interfaces on Ubuntu 18.04.
  • Fix compilation of SGE by downloading sources from Debian repository and not from the EOL Ubuntu 19.10.

AWS ParallelCluster v2.10.0

18 Nov 16:21
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.0

This is associated with AWS ParallelCluster v2.10.0.

ENHANCEMENTS

  • Add support for CentOS 8.
  • Add support for instance types with multiple network cards (e.g. p4d.24xlarge).
  • Enable FSx Lustre in China regions.
  • Add validation step for AMI creation process to fail when using a base AMI created by a different version of
    ParallelCluster.
  • Add validation step for AMI creation process to fail if the selected OS and the base AMI OS are not consistent.
  • Add possibility to use a post installation script when building an AMI.
  • Install NVIDIA Fabric manager to enable NVIDIA NVSwitch on supported platforms.

CHANGES

  • Upgrade EFA installer to version 1.10.1
    • EFA configuration: efa-config-1.5 (from efa-config-1.4)
    • EFA profile: efa-profile-1.1 (from efa-profile-1.0.0)
    • EFA kernel module: efa-1.10.2 (from efa-1.6.0)
    • RDMA core: rdma-core-31.amzn0 (from rdma-core-28.amzn0)
    • Libfabric: libfabric-1.11.1amzn1.1 (from libfabric-1.10.1amzn1.1)
    • Open MPI: openmpi40-aws-4.0.5 (from openmpi40-aws-4.0.3)
    • Unifies installer runtime options across x86 and aarch64
    • Introduces -g/--enable-gdr switch to install packages with GPUDirect RDMA support
    • Updates to OMPI collectives decision file packaging, migrated from efa-config to efa-profile
    • Introduces CentOS 8 support
  • CentOS 6 is no longer supported.
  • Upgrade NVIDIA driver to version 450.80.02.
  • Upgrade Intel Parallel Studio XE Runtime to version 2020.2.
  • Upgrade Munge to version 0.5.14.
  • Retrieve FSx Lustre DNS name dynamically.
  • Slurm: change SlurmctldPort to 6820-6829 to not overlap with default slurmdbd port (6819).
  • Slurm: add compute_resource name and efa as node features.
  • Improve Slurm and Munge installation process by cleaning up existing installations from OS repositories.
  • Install Python 3 version of aws-cfn-bootstrap scripts.
  • Do not force compute fleet into STOPPED state when performing a cluster update. This allows to update the queue
    size without forcing a termination of the existing instances.

BUG FIXES

  • Fix ephemeral drives setup to avoid failures when partition changes require a reboot.
  • Fix Chrony service management.
  • Retrieve the right number of compute instance slots when instance type is updated.
  • Fix compute fleet status initialization to be configured before daemons are started by supervisord.

AWS ParallelCluster v2.9.1

15 Sep 14:38
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.9.1.

This is associated with AWS ParallelCluster v2.9.1.

CHANGES

  • There were no notable changes for this version.

AWS ParallelCluster v2.9.0

11 Sep 18:53
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 2.9.0.

This is associated with AWS ParallelCluster v2.9.0

ENHANCEMENTS

  • Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
  • Extend NICE DCV support to ARM instances.
  • Extend support to disable hyperthreading on instances (like *.metal) that don't support CpuOptions in
    LaunchTemplate.
  • Enable support for NFS 4 for the filesystems shared from the head node.
  • Add script wrapper to support Torque-like commands with the Slurm scheduler.

CHANGES

  • A Route53 private hosted zone is now created together with the cluster and used in DNS resolution inside cluster nodes
    when using Slurm scheduler.
  • Upgrade EFA installer to version 1.9.5:
    • EFA configuration: efa-config-1.4 (from efa-config-1.3)
    • EFA profile: efa-profile-1.0.0
    • EFA kernel module: efa-1.6.0 (no change)
    • RDMA core: rdma-core-28.amzn0 (no change)
    • Libfabric: libfabric-1.10.1amazon1.1 (no change)
    • Open MPI: openmpi40-aws-4.0.3 (no change)
  • Upgrade Slurm to version 20.02.4.
  • Apply the following changes to Slurm configuration:
    • Assign a range of 10 ports to Slurmctld in order to better perform with large cluster settings
    • Configure cloud scheduling logic
    • Set ReconfigFlags=KeepPartState
    • Set MessageTimeout=60
    • Set TaskPlugin=task/affinity,task/cgroup together with TaskAffinity=no and ConstrainCores=yes in cgroup.conf
  • Upgrade NICE DCV to version 2020.1-9012.
  • Use private ip instead of master node hostname when mounting shared NFS drives.
  • Add new log streams to CloudWatch: chef-client, clustermgtd, computemgtd, slurm_resume, slurm_suspend.
  • Remove dependency on cfn-init in compute nodes bootstrap.
  • Add support for queue names in pre/post install scripts.

BUG FIXES

  • Solve dpkg lock issue with Ubuntu that prevented custom AMI creation in some cases.