Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect file offset calculation in memory mapping #220

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Jan 21, 2025

The current implementation incorrectly assumes that calculating a file
offset from a process memory address can be done using a simple
subtraction from the library's load address. This assumption doesn't
hold for binaries with non-standard ELF layouts, where PT_LOAD segments
may have different virtual address to file offset mappings.

Fix the issue by:

  1. First converting the absolute process address to a library-relative
    offset by subtracting the library's load point in the process
  2. Finding the PT_LOAD segment in the ELF file that contains this offset
  3. Using the segment's p_vaddr and p_offset to calculate the correct
    file offset

To avoid performance penalties from repeatedly parsing ELF files, add
caching of PT_LOAD segments per library.

Example of what was wrong:
old: file_offset = addr - lib_start
new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset

This fixes an issue where pystack would read from incorrect file offsets
when analyzing binaries compiled with non-standard layout options (e.g.,
when using the gold linker with custom flags).

@pablogsal pablogsal changed the title elf fix core Fix incorrect file offset calculation in memory mapping Jan 21, 2025
@pablogsal pablogsal force-pushed the elf_fix_core branch 6 times, most recently from 1ec33d2 to a74a66e Compare January 21, 2025 18:54
The current implementation incorrectly assumes that calculating a file
offset from a process memory address can be done using a simple
subtraction from the library's load address. This assumption doesn't
hold for binaries with non-standard ELF layouts, where PT_LOAD segments
may have different virtual address to file offset mappings.

Fix the issue by:
1. First converting the absolute process address to a library-relative
   offset by subtracting the library's load point in the process
2. Finding the PT_LOAD segment in the ELF file that contains this offset
3. Using the segment's p_vaddr and p_offset to calculate the correct
   file offset

To avoid performance penalties from repeatedly parsing ELF files, add
caching of PT_LOAD segments per library.

Example of what was wrong:
  old: file_offset = addr - lib_start
  new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset

This fixes an issue where pystack would read from incorrect file offsets
when analyzing binaries compiled with non-standard layout options (e.g.,
when using the gold linker with custom flags).

Signed-off-by: Pablo Galindo <[email protected]>
Signed-off-by: Pablo Galindo <[email protected]>
@pablogsal pablogsal force-pushed the elf_fix_core branch 3 times, most recently from 1b17fec to 3b04fd4 Compare January 21, 2025 21:34
@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 67.34694% with 16 lines in your changes missing coverage. Please review.

Project coverage is 83.41%. Comparing base (19b9759) to head (a2bd7ae).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/pystack/_pystack/mem.cpp 76.74% 10 Missing ⚠️
src/pystack/_pystack/process.cpp 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
- Coverage   83.51%   83.41%   -0.11%     
==========================================
  Files          46       46              
  Lines        6201     6246      +45     
  Branches      134      459     +325     
==========================================
+ Hits         5179     5210      +31     
- Misses       1020     1036      +16     
+ Partials        2        0       -2     
Flag Coverage Δ
cpp 83.41% <67.34%> (+18.97%) ⬆️
python_and_cython 83.41% <67.34%> (-15.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pablo Galindo <[email protected]>
Signed-off-by: Pablo Galindo Salgado <[email protected]>
Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages for the fighting-the-CI commits need rewording to explain the rationale for the changes. Other than that...

src/pystack/_pystack/mem.cpp Outdated Show resolved Hide resolved
src/pystack/_pystack/mem.cpp Outdated Show resolved Hide resolved
// withing the chunk of the segment in the core file. map.End() corresponds
// to the end of the segment in memory when the process was alive but when the core was created not all that data will be in the core, so we need to use map.FileSize() to get the end of the
// segment in the core file.
uintptr_t fileEnd = map.Start() + map.FileSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that fileEnd <= map.End()? If so, should we be checking that condition as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, do you want an assert for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly checking my understanding, but I wouldn't be opposed to an assert

Comment on lines +759 to +762
if (major != 2 && major != 3) {
LOG(DEBUG) << "Failed to determine Python version from symbols: invalid major version";
return {-1, -1};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check whether PY_RELEASE_LEVEL is in {0xA, 0xB, 0xC, 0xF} while we're at it? The valid values for that are almost as constrained as for the major version.

GElf_Xword size;
};
// Cache for PT_LOAD segments
mutable std::unordered_map<std::string, std::vector<ElfLoadSegment>> d_elf_load_segments_cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this being mutable, but, fine... I guess you're delaying populating this cache instead of doing it when the memory manager is constructed to avoid paying the cost until the first time memory is actually read from a given shared library?

}

elf_end(elf);
close(fd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't elf_end close the file descriptor? I thought a successful elf_begin takes over ownership of it, so this seems like a double close to me. Should be easy to confirm that by checking the return code...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. I misread some of our other code, which uses elf_unique_ptr to guarantee that elf_end gets called, and then doesn't call close after that point - but that's because it also uses file_unique_ptr, which always calls close, and I missed it because it was hidden in a destructor.

That said: maybe we should be reusing those two helper types, since we use them elsewhere.

return StatusCode::ERROR;
}

Elf* elf = elf_begin(fd, ELF_C_READ, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ELF_C_READ? We use ELF_C_READ_MMAP everywhere else.

for (size_t i = 0; i < phnum; i++) {
GElf_Phdr phdr_mem;
GElf_Phdr* phdr = gelf_getphdr(elf, i, &phdr_mem);
if (phdr == nullptr) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just silently ignoring a legitimate error?

Comment on lines +482 to +484
if (cache_it->second.empty()) {
return StatusCode::ERROR;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case can never happen. Only initLoadSegments can add a key/value pair to the cache, and it returns an error if the value it would add is empty.

Comment on lines +492 to +498
// Find the correct segment
for (const auto& segment : cache_it->second) {
if (symbol_vaddr >= segment.vaddr && symbol_vaddr < segment.vaddr + segment.size) {
*offset_in_file = (symbol_vaddr - segment.vaddr) + segment.offset;
return StatusCode::SUCCESS;
}
}
Copy link
Contributor

@godlygeek godlygeek Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we've got d_vmaps which is a vector of files and their offsets in virtual memory, and we iterate over it to find the file that contains a given virtual address, then we look up the vector of segments contained in that file, and iterate over it to find the segment that contains the given virtual address.

Wouldn't everything be simpler if, instead of introducing a second level of vectors, we flattened things so that we keep a vector of segments as a member variable instead of a vector of ELF files?

Copy link
Contributor

@godlygeek godlygeek Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only advantage I can think of for how you've done it is that it's lazier and allows delaying reading the segments for an ELF file until the first time they're needed. But, isn't reading the program headers quite fast? At least there's not much seeking involved, nor copying of strings, just reading a bunch of integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow. These are different things:

  • d_vmaps is not a vector of files and their offsets: is a vector of memory segments (that some are backed by files) as seen by the core: this is what the core has mapped inside of it. Some data may be missing.
  • What we are storing in the cache bt initLoadSegments and we are iterating here are the segments in the ELF files that have been mapped to the process when it was alive. Some of the data in these segments may be in the core and some may not. The addresses in these segments are not relocated to the process.

First, we look at d_vmaps and we look for the file that contains a given virtual address that'a absolute (an address in the process memory), and then we look up the ELF segments for that file but what we search for is an address that's not absolute anymore because we have subtracted the load point of the library.

So not sure I understand the proposal to flatten. There is nothing obvious that I can think of that we can flatten.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just talked offline, but for the sake of anyone who finds the PR later:

The thing that we could represent as a flat structure is a list of virtual address ranges, and, for each range, the file that should be read to find its contents and the offset to begin reading at to find the start of the range. This would let us stop treating from-core and from-executable as two separate cases, and use exactly the same code to handle them, at the cost of extra time in the constructor to build this mapping.

We argued back and forth a bit about the pros and cons of the two approaches, and agreed to try to pair program the other approach on Monday to see if it's as clean as I think it'll be or as slow and ugly as Pablo thinks it will be 😆

pablogsal and others added 2 commits January 25, 2025 00:48
Co-authored-by: Matt Wozniski <[email protected]>
Signed-off-by: Pablo Galindo Salgado <[email protected]>
Co-authored-by: Matt Wozniski <[email protected]>
Signed-off-by: Pablo Galindo Salgado <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants