Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--top-io does not behave as expected #75

Open
emandret opened this issue Aug 27, 2024 · 7 comments
Open

--top-io does not behave as expected #75

emandret opened this issue Aug 27, 2024 · 7 comments

Comments

@emandret
Copy link

emandret commented Aug 27, 2024

SUMMARY

The --top-io flag should report the process causing the largest number of IO operations, but it seems like this does not work if the process is spawning child processes which are causing the IO operations.

However, dstat --top-io correctly reports the parent process. I looked at the code for the --top-io plugin and the code is similar, so I wonder what could be causing such a difference.

ISSUE TYPE
  • Bug Report
DOOL VERSION

Dool 1.3.2

OS / ENVIRONMENT

Platform posix/linux
Kernel 5.15.0-119-generic
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]

STEPS TO REPRODUCE

Save this python snippet at /tmp/high_tps.py,

import os
import time
import subprocess

def high_tps_reader(file_path, num_reads_per_second, block_size):
    while True:
        for _ in range(num_reads_per_second):
            subprocess.run([ "dd", f"if={file_path}", "of=/dev/null", f"bs={block_size}", "count=1" ], stderr=subprocess.DEVNULL)

if __name__ == __name__:
    file_path = "/tmp/testfile.txt"
    num_reads_per_second = 10000
    block_size = "4K"  # Read 4KB at a time

    if not os.path.exists(file_path):
        with open(file_path, "wb") as f:
            f.write(b"a" * 1024 * 1024)

    # Start the high TPS reader
    high_tps_reader(file_path, num_reads_per_second, block_size)

And run,

python3 high_tps.py &
dstat --top-io
dool --top-io
EXPECTED RESULTS

The output of both dstat --top-io and dool --top-io should match.

ACTUAL RESULTS

See screenshot in the comments.

@emandret
Copy link
Author

high_tps

@emandret
Copy link
Author

emandret commented Aug 27, 2024

Weirdly, it seems that just renaming two variables in the --top-io plugin file solves the issue. I have no idea, but this does work on my machine:

sudo cp /usr/share/dool/dool_top_io.py /usr/share/dool/dool_top_io.py.orig
sudo sed -i -e 's/read_bytes/rchar/g' -e 's/write_bytes/wchar/g' /usr/share/dool/dool_top_io.py
dool --top-io

It seems like this is related to the values defined in /proc/*/io as follows:

cat /proc/$(pgrep python3)/io

Which on my machine, returns,

rchar: 60131822108
wchar: 22285226750
syscr: 74499098
syscw: 21285429
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0

@scottchiefbaker
Copy link
Owner

Good research! Looks like that change landed in 64c2cc4

The documentation on that file in proc is here. Honestly, after reading that I'm not sure which field we should be using.

@scottchiefbaker
Copy link
Owner

There are four plugins that all seem to access read_bytes:

plugins/dool_top_io.py
plugins/dool_top_bio.py
plugins/dool__pid_detail.py
plugins/dool_top_bio_adv.py

Whatever we decide, we should make sure all the plugins are set the same.

@scottchiefbaker
Copy link
Owner

I'm going to close this issue unless we can find a good reason to use rchar instead of read_bytes

@emandret
Copy link
Author

emandret commented Jan 5, 2025

It depends. Consider the following:

  • rchar corresponds to the number of bytes read from the application POV, e.g., opened files which may be read from page cache, and reading from an anonymous pipe.
  • read_bytes corresponds to the number of bytes read from the actual storage layer, e.g., when the block device driver is used.

I think dool_top_io should use rchar (any IO, whether it's from a block device, page cache, or IPC), but dool_top_bio should use read_bytes (only the actual block IO).

@scottchiefbaker
Copy link
Owner

scottchiefbaker commented Jan 5, 2025

I think this is solid reasoning. Would you like to submit a PR with these changes? Probably worth documenting in the plugin file explicitly what each one reads and why. I'm sure this will come up again in the future.

This was referenced Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants