Skip to content

Commit

Permalink
Add possibility to load SBD watchdog kernel modules (#82)
Browse files Browse the repository at this point in the history
SBD needs a watchdog device to work. The watchdog device is created by
loading one or more watchdog kernel modules. The modules can be
configured on a per node basis. Blocking certain modules from being
loaded is also often a use case (e.g due to hardware specifics).

Signed-off-by: Eike Waldt <[email protected]>
  • Loading branch information
yeoldegrove authored Apr 3, 2023
1 parent e48da80 commit a5cbd7c
Show file tree
Hide file tree
Showing 6 changed files with 200 additions and 12 deletions.
76 changes: 69 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1110,28 +1110,39 @@ all:
#### SBD watchdog and devices
When using SBD, you may optionally configure watchdog and SBD devices for each
node in inventory. Even though all SBD devices must be shared to and accessible
from all nodes, each node may use different names for the devices. Watchdog may
be different for each node as well. See also [SBD
variables](#ha_cluster_sbd_enabled).
from all nodes, each node may use different names for the devices. The loaded
watchdog modules and used devices may also be different for each node. See also
[SBD variables](#ha_cluster_sbd_enabled).

Example inventory with targets `node1` and `node2`:
```yaml
all:
hosts:
node1:
ha_cluster:
sbd_watchdog_modules:
- module1
- module2
sbd_watchdog: /dev/watchdog2
sbd_devices:
- /dev/vdx
- /dev/vdy
node2:
ha_cluster:
sbd_watchdog_modules:
- module1
sbd_watchdog_modules_blocklist:
- module2
sbd_watchdog: /dev/watchdog1
sbd_devices:
- /dev/vdw
- /dev/vdz
```

* `sbd_watchdog_modules` (optional) - Watchdog kernel modules to be loaded
(creates `/dev/watchdog*` devices). Defaults to empty list if not set.
* `sbd_watchdog_modules_blocklist` (optional) - Watchdog kernel modules to be
unloaded and blocked. Defaults to empty list if not set.
* `sbd_watchdog` (optional) - Watchdog device to be used by SBD. Defaults to
`/dev/watchdog` if not set.
* `sbd_devices` (optional) - Devices to use for exchanging SBD messages and for
Expand Down Expand Up @@ -1238,6 +1249,45 @@ in /var/lib/pcsd with the file name FILENAME.crt and FILENAME.key, respectively.
```

### Configuring cluster to use SBD

#### inventory

These variables need to be set in inventory or via `host_vars`. Of course
the SBD kernel modules and device path might differ depending on your setup.

```yaml
all:
hosts:
node1:
ha_cluster:
sbd_watchdog_modules:
- iTCO_wdt
sbd_watchdog_modules_blocklist:
- ipmi_watchdog
sbd_watchdog: /dev/watchdog1
sbd_devices:
- /dev/vdx
- /dev/vdy
- /dev/vdz
node2:
ha_cluster:
sbd_watchdog_modules:
- iTCO_wdt
sbd_watchdog_modules_blocklist:
- ipmi_watchdog
sbd_watchdog: /dev/watchdog1
sbd_devices:
- /dev/vdx
- /dev/vdy
- /dev/vdz
```

#### playbook

After setting the inventory correctly, use this playbook to configure a
complete SBD setup including loading watchdog modules and creating the
SBD stonith resource.

```yaml
- hosts: node1 node2
vars:
Expand All @@ -1252,12 +1302,24 @@ in /var/lib/pcsd with the file name FILENAME.crt and FILENAME.key, respectively.
- name: timeout-action
value: 'flush,reboot'
- name: watchdog-timeout
value: 5
# if you need to set stonith-watchdog-timeout property as well:
value: 30
# Best practice for setting SBD timeouts:
# watchdog-timeout * 2 = msgwait-timeout (set automatically)
# msgwait-timeout * 1.2 = stonith-timeout
ha_cluster_cluster_properties:
- attrs:
- name: stonith-watchdog-timeout
value: 10
- name: stonith-timeout
value: 72
ha_cluster_resource_primitives:
- id: fence_sbd
agent: 'stonith:fence_sbd'
instance_attrs:
- attrs:
# taken from host_vars
- name: devices
value: "{{ ha_cluster.sbd_devices | join(',') }}"
- name: pcmk_delay_base
value: 30
roles:
- linux-system-roles.ha_cluster
Expand Down
21 changes: 17 additions & 4 deletions examples/sbd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
---
- name: Example ha_cluster role invocation - cluster with SBD
hosts: node1 node2
# do not forget to also set variables via inventory or host_vars.
vars:
ha_cluster_manage_firewall: true
ha_cluster_manage_selinux: true
Expand All @@ -16,12 +17,24 @@
- name: timeout-action
value: 'flush,reboot'
- name: watchdog-timeout
value: 5
# if you need to set stonith-watchdog-timeout property as well:
value: 30
# Best practice for setting SBD timeouts:
# watchdog-timeout * 2 = msgwait-timeout (set automatically)
# msgwait-timeout * 1.2 = stonith-timeout
ha_cluster_cluster_properties:
- attrs:
- name: stonith-watchdog-timeout
value: 10
- name: stonith-timeout
value: 72
ha_cluster_resource_primitives:
- id: fence_sbd
agent: 'stonith:fence_sbd'
instance_attrs:
- attrs:
# taken from host_vars
- name: devices
value: "{{ ha_cluster.sbd_devices | join(',') }}"
- name: pcmk_delay_base
value: 30

roles:
- linux-system-roles.ha_cluster
1 change: 1 addition & 0 deletions meta/collection-requirements.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: MIT
---
collections:
- community.general
- fedora.linux_system_roles
40 changes: 39 additions & 1 deletion tasks/sbd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,45 @@
- name: Manage SBD
when: ha_cluster_sbd_enabled
block:
- name: Manage SBD
- name: Configure SBD watchdog
block:
- name: Configure and unload watchdog kernel modules from blocklist
block:
- name: Configure watchdog kernel module blocklist
lineinfile:
path: "/etc/modprobe.d/{{ item }}.conf"
create: true
mode: 0644
regexp: "^(options|blacklist) {{ item }}"
line: "blacklist {{ item }}"
state: present
loop: "{{ ha_cluster.sbd_watchdog_modules_blocklist | d([]) }}"

- name: Unload watchdog kernel modules from blocklist
modprobe:
name: "{{ item }}"
state: absent
loop: "{{ ha_cluster.sbd_watchdog_modules_blocklist | d([]) }}"

- name: Configure and load watchdog kernel module
block:
- name: Configure watchdog kernel modules
lineinfile:
path: "/etc/modules-load.d/{{ item }}.conf"
create: true
mode: 0644
regexp: "^{{ item }}"
line: "{{ item }}"
state: present
loop: "{{ ha_cluster.sbd_watchdog_modules | d([]) }}"

- name: Load watchdog kernel modules
modprobe:
name: "{{ item }}"
state: present
loop: "{{ ha_cluster.sbd_watchdog_modules | d([]) }}"

- name: Manage SBD devices
# Ideally, the block as a whole should run one node at a time. This does
# not seem to be possible with Ansible yet. Instead, we at least make the
# block's tasks run one by one. This way, we avoid possible issues caused
Expand Down
3 changes: 3 additions & 0 deletions tasks/test_setup_sbd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
- name: Load softdog module for SBD to have at least one watchdog
command: modprobe softdog
changed_when: true
# do not load if sbd tests are run (loads module instead)
when:
- not (__test_disable_modprobe | d(false))

- name: Create backing files for SBD devices
tempfile:
Expand Down
71 changes: 71 additions & 0 deletions tests/tests_sbd_all_options.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,16 @@
include_role:
name: linux-system-roles.ha_cluster
tasks_from: test_setup_sbd.yml
vars:
__test_disable_modprobe: true

- name: Set SBD devices and watchdogs
set_fact:
ha_cluster:
sbd_watchdog_modules:
- softdog
sbd_watchdog_modules_blocklist:
- iTCO_wdt
sbd_watchdog: /dev/null
sbd_devices:
- "{{ __test_sbd_mount.stdout }}"
Expand All @@ -42,6 +48,71 @@
name: linux-system-roles.ha_cluster
public: true

- name: Slurp generated SBD watchdog blocklist file
slurp:
src: /etc/modprobe.d/iTCO_wdt.conf
register: __test_sbd_watchdog_blocklist_file

- name: Decode SBD watchdog blocklist file
set_fact:
__test_sbd_watchdog_blocklist_file_lines: "{{
(__test_sbd_watchdog_blocklist_file.content |
b64decode).splitlines()
}}"

- name: Print SBD watchdog blocklist file lines
debug:
var: __test_sbd_watchdog_blocklist_file_lines

- name: Check SBD watchdog blocklist file
assert:
that:
- __blockstr in __test_sbd_watchdog_blocklist_file_lines
vars:
__blockstr: blacklist iTCO_wdt

- name: Slurp generated SBD watchdog modprobe file
slurp:
src: /etc/modules-load.d/softdog.conf
register: __test_sbd_watchdog_modprobe_file

- name: Decode SBD watchdog modprobe file
set_fact:
__test_sbd_watchdog_modprobe_file_lines: "{{
(__test_sbd_watchdog_modprobe_file.content |
b64decode).splitlines()
}}"

- name: Print SBD watchdog modprobe file lines
debug:
var: __test_sbd_watchdog_modprobe_file_lines

- name: Check SBD watchdog modprobe file
assert:
that:
- __modulestr in __test_sbd_watchdog_modprobe_file_lines
vars:
__modulestr: softdog

- name: Run lsmod for SBD watchdog module
command: lsmod
changed_when: false
register: __test_sbd_watchdog_sbd_lsmod

- name: Print lsmod
debug:
var: __test_sbd_watchdog_sbd_lsmod

- name: Check lsmod output for absence of SBD watchdog module blocklist
assert:
that:
- "'iTCO_wdt' not in __test_sbd_watchdog_sbd_lsmod.stdout"

- name: Check lsmod output for SBD watchdog module
assert:
that:
- "'softdog' in __test_sbd_watchdog_sbd_lsmod.stdout"

- name: Slurp SBD config file
slurp:
src: /etc/sysconfig/sbd
Expand Down

0 comments on commit a5cbd7c

Please sign in to comment.