Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect file offset calculation in memory mapping #220
base: main
Are you sure you want to change the base?
Fix incorrect file offset calculation in memory mapping #220
Changes from 6 commits
ac092af
09efbcd
813c89f
d1e586c
a682000
f3f8a62
eb2d045
a2bd7ae
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that
fileEnd <= map.End()
? If so, should we be checking that condition as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, do you want an assert for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly checking my understanding, but I wouldn't be opposed to an assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
ELF_C_READ
? We useELF_C_READ_MMAP
everywhere else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just silently ignoring a legitimate error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't
elf_end
close the file descriptor? I thought a successfulelf_begin
takes over ownership of it, so this seems like a double close to me. Should be easy to confirm that by checking the return code...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn;t: https://github.com/roolebo/elfutils/blob/cff53f1784c9a4344604bedf41b7d499b3eb30d5/libelf/elf_end.c#L42-L238
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. I misread some of our other code, which uses
elf_unique_ptr
to guarantee thatelf_end
gets called, and then doesn't callclose
after that point - but that's because it also usesfile_unique_ptr
, which always calls close, and I missed it because it was hidden in a destructor.That said: maybe we should be reusing those two helper types, since we use them elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case can never happen. Only
initLoadSegments
can add a key/value pair to the cache, and it returns an error if the value it would add is empty.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we've got
d_vmaps
which is a vector of files and their offsets in virtual memory, and we iterate over it to find the file that contains a given virtual address, then we look up the vector of segments contained in that file, and iterate over it to find the segment that contains the given virtual address.Wouldn't everything be simpler if, instead of introducing a second level of vectors, we flattened things so that we keep a vector of segments as a member variable instead of a vector of ELF files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only advantage I can think of for how you've done it is that it's lazier and allows delaying reading the segments for an ELF file until the first time they're needed. But, isn't reading the program headers quite fast? At least there's not much seeking involved, nor copying of strings, just reading a bunch of integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow. These are different things:
d_vmaps
is not a vector of files and their offsets: is a vector of memory segments (that some are backed by files) as seen by the core: this is what the core has mapped inside of it. Some data may be missing.initLoadSegments
and we are iterating here are the segments in the ELF files that have been mapped to the process when it was alive. Some of the data in these segments may be in the core and some may not. The addresses in these segments are not relocated to the process.First, we look at
d_vmaps
and we look for the file that contains a given virtual address that'a absolute (an address in the process memory), and then we look up the ELF segments for that file but what we search for is an address that's not absolute anymore because we have subtracted the load point of the library.So not sure I understand the proposal to flatten. There is nothing obvious that I can think of that we can flatten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just talked offline, but for the sake of anyone who finds the PR later:
The thing that we could represent as a flat structure is a list of virtual address ranges, and, for each range, the file that should be read to find its contents and the offset to begin reading at to find the start of the range. This would let us stop treating from-core and from-executable as two separate cases, and use exactly the same code to handle them, at the cost of extra time in the constructor to build this mapping.
We argued back and forth a bit about the pros and cons of the two approaches, and agreed to try to pair program the other approach on Monday to see if it's as clean as I think it'll be or as slow and ugly as Pablo thinks it will be 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this being
mutable
, but, fine... I guess you're delaying populating this cache instead of doing it when the memory manager is constructed to avoid paying the cost until the first time memory is actually read from a given shared library?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check whether PY_RELEASE_LEVEL is in
{0xA, 0xB, 0xC, 0xF}
while we're at it? The valid values for that are almost as constrained as for the major version.