-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect file offset calculation in memory mapping #220
base: main
Are you sure you want to change the base?
Conversation
1ec33d2
to
a74a66e
Compare
The current implementation incorrectly assumes that calculating a file offset from a process memory address can be done using a simple subtraction from the library's load address. This assumption doesn't hold for binaries with non-standard ELF layouts, where PT_LOAD segments may have different virtual address to file offset mappings. Fix the issue by: 1. First converting the absolute process address to a library-relative offset by subtracting the library's load point in the process 2. Finding the PT_LOAD segment in the ELF file that contains this offset 3. Using the segment's p_vaddr and p_offset to calculate the correct file offset To avoid performance penalties from repeatedly parsing ELF files, add caching of PT_LOAD segments per library. Example of what was wrong: old: file_offset = addr - lib_start new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset This fixes an issue where pystack would read from incorrect file offsets when analyzing binaries compiled with non-standard layout options (e.g., when using the gold linker with custom flags). Signed-off-by: Pablo Galindo <[email protected]>
Signed-off-by: Pablo Galindo <[email protected]>
1b17fec
to
3b04fd4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #220 +/- ##
==========================================
- Coverage 83.51% 83.41% -0.11%
==========================================
Files 46 46
Lines 6201 6246 +45
Branches 134 459 +325
==========================================
+ Hits 5179 5210 +31
- Misses 1020 1036 +16
+ Partials 2 0 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
3b04fd4
to
5644a23
Compare
Signed-off-by: Pablo Galindo <[email protected]>
5644a23
to
813c89f
Compare
Signed-off-by: Pablo Galindo Salgado <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit messages for the fighting-the-CI commits need rewording to explain the rationale for the changes. Other than that...
// withing the chunk of the segment in the core file. map.End() corresponds | ||
// to the end of the segment in memory when the process was alive but when the core was created not all that data will be in the core, so we need to use map.FileSize() to get the end of the | ||
// segment in the core file. | ||
uintptr_t fileEnd = map.Start() + map.FileSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that fileEnd <= map.End()
? If so, should we be checking that condition as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, do you want an assert for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly checking my understanding, but I wouldn't be opposed to an assert
if (major != 2 && major != 3) { | ||
LOG(DEBUG) << "Failed to determine Python version from symbols: invalid major version"; | ||
return {-1, -1}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check whether PY_RELEASE_LEVEL is in {0xA, 0xB, 0xC, 0xF}
while we're at it? The valid values for that are almost as constrained as for the major version.
GElf_Xword size; | ||
}; | ||
// Cache for PT_LOAD segments | ||
mutable std::unordered_map<std::string, std::vector<ElfLoadSegment>> d_elf_load_segments_cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this being mutable
, but, fine... I guess you're delaying populating this cache instead of doing it when the memory manager is constructed to avoid paying the cost until the first time memory is actually read from a given shared library?
} | ||
|
||
elf_end(elf); | ||
close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't elf_end
close the file descriptor? I thought a successful elf_begin
takes over ownership of it, so this seems like a double close to me. Should be easy to confirm that by checking the return code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. I misread some of our other code, which uses elf_unique_ptr
to guarantee that elf_end
gets called, and then doesn't call close
after that point - but that's because it also uses file_unique_ptr
, which always calls close, and I missed it because it was hidden in a destructor.
That said: maybe we should be reusing those two helper types, since we use them elsewhere.
return StatusCode::ERROR; | ||
} | ||
|
||
Elf* elf = elf_begin(fd, ELF_C_READ, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ELF_C_READ
? We use ELF_C_READ_MMAP
everywhere else.
for (size_t i = 0; i < phnum; i++) { | ||
GElf_Phdr phdr_mem; | ||
GElf_Phdr* phdr = gelf_getphdr(elf, i, &phdr_mem); | ||
if (phdr == nullptr) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just silently ignoring a legitimate error?
if (cache_it->second.empty()) { | ||
return StatusCode::ERROR; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case can never happen. Only initLoadSegments
can add a key/value pair to the cache, and it returns an error if the value it would add is empty.
// Find the correct segment | ||
for (const auto& segment : cache_it->second) { | ||
if (symbol_vaddr >= segment.vaddr && symbol_vaddr < segment.vaddr + segment.size) { | ||
*offset_in_file = (symbol_vaddr - segment.vaddr) + segment.offset; | ||
return StatusCode::SUCCESS; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we've got d_vmaps
which is a vector of files and their offsets in virtual memory, and we iterate over it to find the file that contains a given virtual address, then we look up the vector of segments contained in that file, and iterate over it to find the segment that contains the given virtual address.
Wouldn't everything be simpler if, instead of introducing a second level of vectors, we flattened things so that we keep a vector of segments as a member variable instead of a vector of ELF files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only advantage I can think of for how you've done it is that it's lazier and allows delaying reading the segments for an ELF file until the first time they're needed. But, isn't reading the program headers quite fast? At least there's not much seeking involved, nor copying of strings, just reading a bunch of integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow. These are different things:
d_vmaps
is not a vector of files and their offsets: is a vector of memory segments (that some are backed by files) as seen by the core: this is what the core has mapped inside of it. Some data may be missing.- What we are storing in the cache bt
initLoadSegments
and we are iterating here are the segments in the ELF files that have been mapped to the process when it was alive. Some of the data in these segments may be in the core and some may not. The addresses in these segments are not relocated to the process.
First, we look at d_vmaps
and we look for the file that contains a given virtual address that'a absolute (an address in the process memory), and then we look up the ELF segments for that file but what we search for is an address that's not absolute anymore because we have subtracted the load point of the library.
So not sure I understand the proposal to flatten. There is nothing obvious that I can think of that we can flatten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just talked offline, but for the sake of anyone who finds the PR later:
The thing that we could represent as a flat structure is a list of virtual address ranges, and, for each range, the file that should be read to find its contents and the offset to begin reading at to find the start of the range. This would let us stop treating from-core and from-executable as two separate cases, and use exactly the same code to handle them, at the cost of extra time in the constructor to build this mapping.
We argued back and forth a bit about the pros and cons of the two approaches, and agreed to try to pair program the other approach on Monday to see if it's as clean as I think it'll be or as slow and ugly as Pablo thinks it will be 😆
Co-authored-by: Matt Wozniski <[email protected]> Signed-off-by: Pablo Galindo Salgado <[email protected]>
Co-authored-by: Matt Wozniski <[email protected]> Signed-off-by: Pablo Galindo Salgado <[email protected]>
The current implementation incorrectly assumes that calculating a file
offset from a process memory address can be done using a simple
subtraction from the library's load address. This assumption doesn't
hold for binaries with non-standard ELF layouts, where PT_LOAD segments
may have different virtual address to file offset mappings.
Fix the issue by:
offset by subtracting the library's load point in the process
file offset
To avoid performance penalties from repeatedly parsing ELF files, add
caching of PT_LOAD segments per library.
Example of what was wrong:
old: file_offset = addr - lib_start
new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset
This fixes an issue where pystack would read from incorrect file offsets
when analyzing binaries compiled with non-standard layout options (e.g.,
when using the gold linker with custom flags).