Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcpu hotplug arm 5.15.12 #13

Open
wants to merge 5 commits into
base: ch-5.15.12
Choose a base branch
from

Conversation

jongwu
Copy link

@jongwu jongwu commented Jun 9, 2022

This patch set enable Vcpu hotplug for arm64, including 5 commit, 4 from Salil Mehta and 1 from Jianyong Wu.

quote from Salil's words:

To support virtual cpu hotplug guest kernel must:

  1. Identify disabled/present vcpus and set/unset the present mask of the vcpu
    during initialization and hotplug event. It must also set the possible mask
    (which includes disabled vcpus) during init of guest kernel.
  2. Provide architecture specific ACPI hooks, for example to map/unmap the
    logical cpuid to hwids/MPIDR. Linux kernel already has generic ACPI cpu
    hotplug framework support.

The last commit fix failure when hot-remove cpu

@jongwu
Copy link
Author

jongwu commented Jun 9, 2022

@justin-he

Salil Mehta and others added 5 commits June 30, 2022 10:19
With ACPI enabled, cpus get identified by the presence of the GICC
entry in the MADT Table. Each GICC entry part of MADT presents cpu as
enabled or disabled. As of now, the disabled cpus are skipped as
physical cpu hotplug is not supported. These remain disabled even after
the kernel has booted.

To support virtual cpu hotplug(in which case disabled vcpus could be
hotplugged even after kernel has booted), QEMU will populate MADT Table
with appropriate details of GICC entry for each possible(present+disabled)
vcpu. Now, during the init time vcpus will be identified as present or
disabled. To achieve this, below changes have been made with respect to
the present/possible vcpu handling along with the mentioned reasoning:

1. Identify all possible(present+disabled) vcpus at boot/init time
   and set their present mask and possible mask. In the existing code,
   cpus are being marked present quite late within smp_prepare_cpus()
   function, which gets called in context to the kernel thread. Since
   the cpu hotplug is not supported, present cpus are always equal to
   the possible cpus. But with cpu hotplug enabled, this assumption is
   not true. Hence, present cpus should be marked while MADT GICC entries
   are bring parsed for each vcpu.
2. Set possible cpus to include disabled. This needs to be done now
   while parsing MADT GICC entries corresponding to each vcpu as the
   disabled vcpu info is available only at this point as for hotplug
   case possible vcpus is not equal to present vcpus.
3. We will store the parsed madt/gicc entry even for the disabled vcpus
   during init time. This is needed as some modules like PMU registers
   IRQs for each possible vcpus during init time. Therefore, a valid
   entry of the MADT GICC should be present for all possible vcpus.
4. Refactoring related to DT/OF is also done to align it with the init
   changes to support vcpu hotplug.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
Bound the total number of identified cpus(including disabled cpus) by
maximum allowed limit by the kernel. Max value is either specified as
part of the kernel parameters 'nr_cpus' or specified during compile
time using CONFIG_NR_CPUS.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
Currently, cpu-operations are only initialized for the cpus which
already have logical cpuid to hwid assoication established. And this
only happens for the cpus which are present during boot time.

To support virtual cpu hotplug, we shall initialize the cpu-operations
for all possible(present+disabled) vcpus. This means logical cpuid to
hwid/mpidr association might not exists(i.e. might be INVALID_HWID)
during init. Later, when the vcpu is actually hotplugged logical cpuid
is allocated and associated with the hwid/mpidr.

This patch does some refactoring to support above change.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
To support virtual cpu hotplug, some arch specific hooks must be
facilitated. These hooks are called by the generic ACPI cpu hotplug
framework during a vcpu hot-(un)plug event handling. The changes
required involve:

1. Allocation of the logical cpuid corresponding to the hwid/mpidr
2. Mapping of logical cpuid to hwid/mpidr and marking present
3. Removing vcpu from present mask during hot-unplug
4. For arm64, all possible cpus are registered within topology_init()
   Hence, we need to override the weak ACPI call of arch_register_cpu()
   (which returns -ENODEV) and return success.
5. NUMA node mapping set for this vcpu using SRAT Table info during init
   time will be discarded as the logical cpu-ids used at that time
   might not be correct. This mapping will be set again using the
   proximity/node info obtained by evaluating _PXM ACPI method.

Note, during hot unplug of vcpu, we do not unmap the association between
the logical cpuid and hwid/mpidr. This remains persistent.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
When hot-remove cpu, the map from cpu to numa will set to NUMA_NO_NODE
which will lead to failure as the map is used by others. Thus we need a
specific map to descrip the unpluged cpu.

Here we introduce a new map to descrip the unpluged cpu map.

Signed-off-by: Jianyong Wu <[email protected]>
@jongwu jongwu force-pushed the vcpu_hotplug_arm_5.15.12 branch from dc29f27 to 13c5067 Compare June 30, 2022 02:40
@jongwu
Copy link
Author

jongwu commented Jun 30, 2022

@rbradford

@rbradford
Copy link
Member

Looking better. Can you create a PR rebased against ch-5.18.8, check that it works and then send upstream?

rbradford pushed a commit that referenced this pull request Jan 14, 2025
[ Upstream commit 63de35a ]

An issue was identified in the dcn21_link_encoder_create function where
an out-of-bounds access could occur when the hpd_source index was used
to reference the link_enc_hpd_regs array. This array has a fixed size
and the index was not being checked against the array's bounds before
accessing it.

This fix adds a conditional check to ensure that the hpd_source index is
within the valid range of the link_enc_hpd_regs array. If the index is
out of bounds, the function now returns NULL to prevent undefined
behavior.

References:

[   65.920507] ------------[ cut here ]------------
[   65.920510] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn21/dcn21_resource.c:1312:29
[   65.920519] index 7 is out of range for type 'dcn10_link_enc_hpd_registers [5]'
[   65.920523] CPU: 3 PID: 1178 Comm: modprobe Tainted: G           OE      6.8.0-cleanershaderfeatureresetasdntipmi200nv2132 #13
[   65.920525] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS WMJ0429N_Weekly_20_04_2 04/29/2020
[   65.920527] Call Trace:
[   65.920529]  <TASK>
[   65.920532]  dump_stack_lvl+0x48/0x70
[   65.920541]  dump_stack+0x10/0x20
[   65.920543]  __ubsan_handle_out_of_bounds+0xa2/0xe0
[   65.920549]  dcn21_link_encoder_create+0xd9/0x140 [amdgpu]
[   65.921009]  link_create+0x6d3/0xed0 [amdgpu]
[   65.921355]  create_links+0x18a/0x4e0 [amdgpu]
[   65.921679]  dc_create+0x360/0x720 [amdgpu]
[   65.921999]  ? dmi_matches+0xa0/0x220
[   65.922004]  amdgpu_dm_init+0x2b6/0x2c90 [amdgpu]
[   65.922342]  ? console_unlock+0x77/0x120
[   65.922348]  ? dev_printk_emit+0x86/0xb0
[   65.922354]  dm_hw_init+0x15/0x40 [amdgpu]
[   65.922686]  amdgpu_device_init+0x26a8/0x33a0 [amdgpu]
[   65.922921]  amdgpu_driver_load_kms+0x1b/0xa0 [amdgpu]
[   65.923087]  amdgpu_pci_probe+0x1b7/0x630 [amdgpu]
[   65.923087]  local_pci_probe+0x4b/0xb0
[   65.923087]  pci_device_probe+0xc8/0x280
[   65.923087]  really_probe+0x187/0x300
[   65.923087]  __driver_probe_device+0x85/0x130
[   65.923087]  driver_probe_device+0x24/0x110
[   65.923087]  __driver_attach+0xac/0x1d0
[   65.923087]  ? __pfx___driver_attach+0x10/0x10
[   65.923087]  bus_for_each_dev+0x7d/0xd0
[   65.923087]  driver_attach+0x1e/0x30
[   65.923087]  bus_add_driver+0xf2/0x200
[   65.923087]  driver_register+0x64/0x130
[   65.923087]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   65.923087]  __pci_register_driver+0x61/0x70
[   65.923087]  amdgpu_init+0x7d/0xff0 [amdgpu]
[   65.923087]  do_one_initcall+0x49/0x310
[   65.923087]  ? kmalloc_trace+0x136/0x360
[   65.923087]  do_init_module+0x6a/0x270
[   65.923087]  load_module+0x1fce/0x23a0
[   65.923087]  init_module_from_file+0x9c/0xe0
[   65.923087]  ? init_module_from_file+0x9c/0xe0
[   65.923087]  idempotent_init_module+0x179/0x230
[   65.923087]  __x64_sys_finit_module+0x5d/0xa0
[   65.923087]  do_syscall_64+0x76/0x120
[   65.923087]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   65.923087] RIP: 0033:0x7f2d80f1e88d
[   65.923087] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[   65.923087] RSP: 002b:00007ffc7bc1aa78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   65.923087] RAX: ffffffffffffffda RBX: 0000564c9c1db130 RCX: 00007f2d80f1e88d
[   65.923087] RDX: 0000000000000000 RSI: 0000564c9c1e5480 RDI: 000000000000000f
[   65.923087] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
[   65.923087] R10: 000000000000000f R11: 0000000000000246 R12: 0000564c9c1e5480
[   65.923087] R13: 0000564c9c1db260 R14: 0000000000000000 R15: 0000564c9c1e54b0
[   65.923087]  </TASK>
[   65.923927] ---[ end trace ]---

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants