-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: add mtu size configuration and RDMA QoS #4646
base: main
Are you sure you want to change the base?
Conversation
0ce904d
to
9f20778
Compare
| 配置复杂度 | 配置相对简单 | 配置较为复杂,需要硬件支持和配置 | | ||
| 兼容性 | 兼容性较好,适用于大多数环境 | 依赖硬件支持,兼容性较差 | | ||
| 适用场景 | 适用于大多数场景,包括裸金属,虚拟机等 | 只适用于裸金属,不适用于虚拟机场景 | | ||
| 成本 | 成本较低,因为不需要额外的硬件支持 | 成本较高,需要支持 SR-IOV 的硬件设备 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
适用 roce 和 ib 的情况
@@ -366,6 +382,33 @@ Spiderpool 使用了 [sriov-network-operator](https://github.com/k8snetworkplumb | |||
EOF | |||
``` | |||
|
|||
4.(可选)自定义 SR-IOV VF 的 MTU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要直接对应真实情况 ,环境中就是会设置为 8000 大包
1 安装 spiderpool 前的 网卡准备环节
(1)母接口 , mtu 8000, 重启要持久化
(2)qos
2 此处调整为 8000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ef347e5
to
f6b8f14
Compare
docs/usage/roce-qos.md
Outdated
1. Configure the script | ||
|
||
```shell | ||
# List of network cards, multiple targets separated by ",", such as eth0,eth1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个脚本的产品化程度不够, 用 AI 编程器 快速补齐 校验
1 各种 变量 ,需要支持 环境变量传入。用命令行方式传入参数,而不是每次需要修改脚本
ECN_NICS=${ECN_NICS:-""}
2 必要的变量 为空, 要告警退出
3 一些变量中 网卡 是否存在,是否是 RDMA 网卡,它们后续要设置配置的文件路径是否存在。都要前置 检测
4 ECN_PRIORITY 要检验 0-7
5 存储服务器上, GPU_NIC_LIST 是可为空的 。 GPU 服务器上, STORAGE_NIC_LIST 是可有可无的。
至少 ,GPU_NIC_LIST 和 STORAGE_NIC_LIST 不能都为空
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all done.
docs/usage/roce-qos.md
Outdated
|
||
```shell | ||
# List of network cards, multiple targets separated by ",", such as eth0,eth1 | ||
export ECN_NICS="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
严格来说,它是 GPU_NIC_LIST
无论 存储 还是 GPU 网络,都需要设置 ECN ,不能用 ECN_NICS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那存储和 GPU 网络的 无损队列是否一致?
docs/usage/roce-qos.md
Outdated
# List of network cards, multiple targets separated by ",", such as eth0,eth1 | ||
export ECN_NICS="" | ||
# Priority of ECN packets, range 0-7 | ||
export ECN_PRIORITY=5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU_NIC__PRIORITY
docs/usage/roce-qos.md
Outdated
# Priority of ECN packets, range 0-7 | ||
export ECN_PRIORITY=5 | ||
# DSCP of ECN packets, range 0-63 | ||
export ECN_QOS=48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU_NIC_QOS
传入为空时,需要 支持 根据 GPU_NIC__PRIORITY 自动算 默认值
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这支持,如果为空,GPU_NIC_QOS = GPU_NIC__PRIORITY * 8
docs/usage/roce-qos.md
Outdated
|
||
mkdir -p /etc/systemd/system/rdma-qos.d | ||
cat <<F_EOF >/etc/systemd/system/rdma-qos.d/10-qos.conf | ||
GROUP_COMPUTING_CONFIG="group=computing;nic=$ECN_NICS;ecn_priority=$ECN_PRIORITY;ecn_qos=$ECN_QOS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GROUP_COMPUTING_CONFIG=""
[ -n "$GPU_NIC_LIST"] && GROUP_COMPUTING_CONFIG=....
docs/usage/roce-qos.md
Outdated
export STORAGE_NICS="" | ||
export STORAGE_ECN_PRIORITY="" | ||
export STORAGE_ECN_QOS="" | ||
cat <<"EOF" >rdma_qos.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 >rdma_qos.sh 是多此一举 ? 直接 跑 后面 紧急的代码 不就好了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是在终端执行 export + cat 等命令后生成脚本再执行。该脚本生成 systemd script 和 unit file 才跑起来。
否则 需要单独 cp 到文件执行 或 拷贝多个命令执行,容易复制出错
docs/usage/roce-qos.md
Outdated
pfc_queue=$(echo "${qos_queues[*]}" | sed 's? ?,?g' | tr -d ' ') | ||
mlnx_qos -i "$nic_item" --trust=dscp --pfc ${pfc_queue} | ||
|
||
echo "echo 1 >/sys/class/net/$nic_item/ecn/roce_np/enable/$ecn_priority" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些路径是否存在,没有检测和报错
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/usage/roce-qos.md
Outdated
2. Add script permissions and execute | ||
|
||
```shell | ||
chmod +x rdma-qos.sh && bash rdma-qos.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最简单情况下,如下命令 就能完成 配置
GPU_NIC_LIST=”“ GPU_NIC_PRIORITY="" \
STORAGE_NIC_LIST="" STORAGE_NIC_PRIORITY="" \
rdma-qos.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最简单情况:
GPU_NIC_LIST=xxx rdma-qos.sh
9cac1d7
to
d521ef5
Compare
a166f07
to
90d0486
Compare
# set -x | ||
# set -o xtrace | ||
set -o errexit | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不如把 那 8 个变量 直接 放进来,还不需要 配置文件 和 格式解析了,比较这个脚本也没什么特别的复杂
这避免 在解析配置时,遇到 特殊字符 出问题
GPU_NIC_LIST="${GPU_NIC_LIST}"
.......
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没法直接放进去, 子脚本无法直接获取到 变量值, 因为用 cat 《“eof” ,注意加了引号
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是可以的,这不是堵塞实现的原因
a=10
cat <<EOF > /tmp/a
a=${a}
echo \${a}
EOF
不过改动较大,你自己把握
[ -n "$GPU_NIC_LIST" ] && validate_nic "$GPU_NIC_LIST" "$GPU_RDMA_PRIORITY" | ||
[ -n "$STORAGE_NIC_LIST" ] && validate_nic "$STORAGE_NIC_LIST" "$STORAGE_RDMA_PRIORITY" | ||
|
||
echo "debug, GPU_NIC_LIST=$GPU_NIC_LIST, GPU_RDMA_PRIORITY=$GPU_RDMA_PRIORITY, GPU_CNP_PRIORITY=$GPU_CNP_PRIORITY, GPU_RDMA_QOS=$GPU_RDMA_QOS, GPU_CNP_QOS=$GPU_CNP_QOS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里日志 应该 置后,放到 算出 GPU_RDMA_QOS
这样,才能看到最终所有的 生效配置
ada1b6b
to
bc51eb5
Compare
docs/example/qos/rdma-qos.sh
Outdated
[Install] | ||
WantedBy=timers.target | ||
T_EOF | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我建议 systemctl 之前,先直接 跑一次 脚本,一方面是 直接生效了,另一方面,是可以看未来 system service 是否 成功
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果想在脚本内 while 1min 循环执行,而这里又想只跑一次,同个脚本两种行为用一个变量来指定?
docs/example/qos/rdma-qos.sh
Outdated
systemctl daemon-reload | ||
systemctl enable rdma-qos.service | ||
systemctl enable rdma-qos.timer | ||
systemctl start rdma-qos.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start -> restart , 支持重装的场景
docs/example/qos/rdma-qos.sh
Outdated
systemctl enable rdma-qos.service | ||
systemctl enable rdma-qos.timer | ||
systemctl start rdma-qos.service | ||
systemctl start rdma-qos.timer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要两个 service
看着挺绕的,为什么不启动一个 常驻的 service, 每 1 min 调用一次设置
docs/example/qos/rdma-qos.sh
Outdated
systemctl restart rdma-qos.service | ||
echo -e "\e[31m Done \e[0m" | ||
|
||
systemctl status rdma-qos.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这条命令会让 终端 卡主,不适合自动化
docs/example/qos/rdma-qos.sh
Outdated
|
||
qos_queues=(0 0 0 0 0 0 0 0) | ||
qos_queues[$rdma_priority]=1 | ||
qos_queues[$cnp_priority]=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cnp 是 不用 开 pfc 的
[Install] | ||
WantedBy=multi-user.target | ||
SYS_EOF | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
启动服务前,先直接跑一把 该脚本, 有错 就退出 ,不安装 service 了
然后这个脚本支持 DEBUG=true 的 变量, 启动服务前,开启 debug, 而 真正 跑起来后, debug=false, 减少 日志
6b995d3
to
0b954fb
Compare
docs/example/qos/rdma-qos.sh
Outdated
echo -e "\e[31m Pre-run rdma_qos.sh once \e[0m" | ||
/usr/local/bin/rdma_qos.sh | ||
|
||
sed -i 's?RUN_ONCE=true?RUN_ONCE=false?' /usr/local/bin/rdma_qos.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
脚本里
DEBUG=${DEBUG:-""}
调用时 DEBUG=true /usr/local/bin/rdma_qos.sh
不需要 还要 动脚本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/example/qos/rdma-qos.sh
Outdated
|
||
chmod +x /usr/local/bin/rdma_qos.sh | ||
echo -e "\e[31m Pre-run rdma_qos.sh once \e[0m" | ||
/usr/local/bin/rdma_qos.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还得 给个 日志
/usr/local/bin/rdma_qos.sh || { echo "error, failed to set qos" ; exit 1 ; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
After=network.target | ||
|
||
[Service] | ||
Type=simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
脚本中 set -o errexit , 有问题直接退出了, 也没报错日志
这里需要开启
Restart=always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Cyclinder Kuo <[email protected]>
Thanks for contributing!
Notice:
"release/none"
"release/bug"
"release/feature"
What issue(s) does this PR fix:
Fixes #
Special notes for your reviewer: