Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running show techsupport on devices with large core files might crash device when /tmp is on tmpfs #20950

Open
assrinivasan opened this issue Nov 27, 2024 · 2 comments
Assignees
Labels
Help Wanted 🆘 Triaged this issue has been triaged

Comments

@assrinivasan
Copy link
Contributor

assrinivasan commented Nov 27, 2024

Description

The show_techsupport/test_auto_techsupport.py::TestAutoTechSupport::test_max_limit[core] test creates huge core files. When /tmp is on tmpfs and available memory is low, it crashes the device.

Steps to reproduce the issue:

  1. Set /tmp folder to be tmpfs
  2. Run tests/show_techsupport/test_auto_techsupport.py::TestAutoTechSupport::test_max_limit[core] on KVM

Describe the results you received:

Filesystem Information and Free Memory During test progression:

Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           385M   17M  369M   5% /run
root-overlay     16G  9.0G  6.5G  59% /
/dev/vda3        16G  9.0G  6.5G  59% /host
tmpfs           1.9G  1.3G  638M  67% /tmp
/dev/loop1      3.9G  5.0M  3.7G   1% /var/log
tmpfs           1.9G   16K  1.9G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
               total        used        free      shared  buff/cache   available
Mem:            3845        3750         107        1306        1531          95
Swap:              0           0           0

The available memory is exhausted when large core files are created, leading to a system crash. This causes the DUT to be unreachable:

27/11/2024 06:24:39 __init__._fixture_generator_decorator    L0099 ERROR  | 
Host unreachable in the inventory
Traceback (most recent call last):
  File "/var/src/sonic-mgmt/tests/common/plugins/log_section_start/__init__.py", line 95, in _fixture_generator_decorator
    next(it)
  File "/var/src/sonic-mgmt/tests/show_techsupport/test_auto_techsupport.py", line 125, in global_rate_limit_zero
    set_auto_techsupport_global(self.duthost, rate_limit=DEFAULT_RATE_LIMIT_GLOBAL)
  File "/var/src/sonic-mgmt/tests/show_techsupport/test_auto_techsupport.py", line 564, in set_auto_techsupport_global
    duthost.shell(cmd)
  File "/var/src/sonic-mgmt/tests/common/devices/multi_asic.py", line 135, in _run_on_asics
    return getattr(self.sonichost, self.multi_asic_attr)(*module_args, **complex_args)
  File "/var/src/sonic-mgmt/tests/common/devices/base.py", line 105, in _run
    res = self.module(*module_args, **complex_args)[self.hostname]
  File "/usr/local/lib/python3.8/dist-packages/pytest_ansible/module_dispatcher/v213.py", line 232, in _run
    raise AnsibleConnectionFailure(
pytest_ansible.errors.AnsibleConnectionFailure: Host unreachable in the inventory

Describe the results you expected:

show techsupport to pass

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Related issue: #15051

@prabhataravind
Copy link
Contributor

Need a way for "show tech" to be aware of the resources on the system.

@prabhataravind prabhataravind added Help Wanted 🆘 Triaged this issue has been triaged labels Dec 4, 2024
@assrinivasan assrinivasan self-assigned this Dec 4, 2024
@assrinivasan
Copy link
Contributor Author

show techsupport could generate sonic dumps in /var/tmp which is on disk, as opposed to /tmp which could be tmpfs. This would resolve the issue. @prabhataravind @prgeor @saiarcot895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help Wanted 🆘 Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

2 participants