-
Notifications
You must be signed in to change notification settings - Fork 3
Deployment
The CentOS 7 doesn't ship with the latest version of the Intel i40e kernel module. To ensure compatibility with the latest chipset revisions, the module should be upgraded. The latest version can be located at, Intel's Sourceforge project site. The following example uses module version 2.7.29, however the steps should be the same for newer versions.
wget https://cfhcable.dl.sourceforge.net/project/e1000/i40e%20stable/2.7.29/i40e-2.7.29.tar.gz
tar zxf i40e-2.7.29.tar.gz
cd i40e-2.7.29/
cd src
make install
reboot
NOTE: The above steps must be performed after any kernel upgrades. First reboot into the new kernel, build the module, then reboot.
The cluster private network is used for NFS, MPI, and other intracluster communications. For improved performance, jumbo frames should be enabled on the private network.
rocks set network mtu private mtu=9000
During the Rocks 7 installation, create a network bond for the public interface. The installer will give you several options for the bond, change the defaults to match the following.
- miimon: 100
- mode: 802.3ad
- lacp_rate: fast
- xmit_hash_policy: layer3+4
The settings can also be changed after installation, by editing the config file directly, /etc/sysconfig/network-scripts/ifcfg-Bond_connection_1.
For the private interface, bonding must be configured after installation.
rocks list host interface deohs-brain
rocks add host bonded deohs-brain channel=bond1 interfaces=enp175s0f0,enp175s0f1 ip=10.64.200.1 network=private
rocks set host interface options deohs-brain bond1 options="miimon=100 mode=802.3ad lacp_rate=fast xmit_hash_policy=layer3+4"
rocks sync host network deohs-brain
Enable MSS clamping to ensure packets sent to remote external servers have a "normal" MTU. This would normally be handled by Path MTU Discovery, however, some firewall configurations can break MTU detection.
Edit /etc/sysconfig/iptables, and add the following to the end:
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -o bond0 -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1460
COMMIT
Before deploying the first compute nodes, it's important to build all update rolls, and apply them to the head node. This will avoid the need to redeploy any compute nodes, after the rolls are built. The update process is covered in detail on the Updating page.
Brain is meant to accommodate a wide assortment of use cases, and user requirements. Among those requirements is the ability to use graphical applications, such as RStudio. To meet hose graphical needs, X2Go is installed on the head node. The install process is covered in a detail on the X2Go Install page.
By default, Rocks installs a rather extensive assortment of libraries and packages. However, there a few additional development packages we need when building software.
yum install readline-devel gmp-devel mpgfr-devel libmpc-devel
While Rocks uses a locally maintained database of users and groups, we don't maintain a local password database. We'll use Kerberos to authenticate users with UW NetID.
yum install pam_krb5
authconfig --enablekrb5 --krb5kdc="k5-primary.u.washington.edu,k5-backup.u.washington.edu" --krb5adminserver="k5-primary.u.washington.edu" --krb5realm="u.washington.edu" --update
Computing nodes should be configured to boot using UEFI. The necessary settings may be named or located in different places, depending on the specific system board. Below is an example for the X11DDW-L.
-
Advanced > PCIe/PCI/PnP Configuration:
- CPU1-AOM.... EFI
- RSC-R1UW.... EFI
-
Boot:
- Boot mode select: UEFI
- Boot Option #1: UEFI Network
- Boot Option #2: UEFI Hard Disk
Before deploying any computing nodes, we need to customize our node configuration. For maintenance purposes, we'll track our extend-compute.xml file within this Git repository. For additional guidance, please review the Node Customization page.
Each compute node has 2x interfaces on the cluster communications network. During initial provisioning, the NICs aren't bonded (due to limitations in PXE). As a result, during startup, the second NIC may attempt a DHCP request, resulting in Rocks detecting an additional compute node. These false nodes can be identified by the fact they never completed kickstart, and have 1 CPU. If any of these appear, they should be deleted:
rocks remove host compute-0-1