-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve X64 jump table handling #20
Comments
Hi, thank you for opening this issue. I appreciate how you included many detailed information and already spotted yourself the source of the problem on case 1. I'm extremely sorry for taking such a long time to answer, but I was taking some time off and no one else was around to answer your question.
What do you mean by "traditional challenge"? Are you talking about distinguishing between code and data? If yes, then I believe the point here it's that it's not solved in the general case, but only on binaries outputted by well-behaved compilers (gcc, clang).
Thank you for linking this paper! It was an interesting read. So while we don't plan on supporting the ICC compiler in the future (as it's not very commonly used), the point they bring up about not being able to correctly symbolize jump tables is interesting to discuss. While it's true that there is no relocation data to specify which parts of the binary's memory are jump tables, we believe that using techniques such as data flow analysis to recognize patters used by compilers only when generating jump tables. The approach to detecting jump tables would be to, for every indirect jump on a register, check if that register comes from an indexed load from memory with an upper bounds check on the index. Please keep in mind that I'm not the original author of Retrowrite, I'm just the guy who is adding support for aarch64, so take everything I've said with a pinch of salt as I may be wrong. The data flow approach I mentioned above was implemented in the aarch64 version of Retrowrite (because jump table detection with heuristics there is impossible), and it works on the hundreds of binaries we tried (including big binaries such as gcc); I believe that using the same approach on x86 would solve this issue. I can see how some people could think that data flow analysis is just another heuristic; but I believe that data flow analysis is exact, and does not need relying on scores/probability to be able to work - it needs to rely on a few assumptions though (such as, we're working with a well-behaved compiler). Thank you again for raising those points above, and for your interest in the project. |
Hi @cyanpencil , Thanks for your reply!
I mean the symbolization of numerical values, that is, to distinguish between scalars and reference offsets statically. For example, in a PIC binary, given a number 8 (see case 2), we want to decide whether it is a scalar or an offset between references (i.e., reference - reference). It is still a undecidable/unsolved problem even for PIC binaries, because such offsets (unlike reference itself) do not have symbols in most cases. Jump table is just a common case of using reference offsets. Like you said, to solve this challenge, we still need some data-flow analysis. But in general cases, data-flow analysis cannot be accurate due to many factors (both false positives and false negatives will influence reassembly). Note that the general data-flow analysis relies on point-to/alias analysis, which is another undecidable problem. For general cases, to be honest, the only solution I can think of is to introduce heuristics (like current jump table identification for x64). Please kindly correct me if I am wrong.
In the original paper, I see "For PIC, we adopt a principled symbolization strategy without heuristics". So I originally thought it works in general cases. But yes, if we are shooting on well-behaved compilers, all the problems should have gone, lol. |
I've thought for a while. I somehow feel it is still heuristical. That is, our heuristics are perfect fit for commonly-used compilers (e.g., gcc, icc, and clang). Based on my understanding, using commonly-used compilers means there are some fixed patterns or other pre-known constraints.
I think relying on scores/probability is just one kind of heuristic approach. Another is, like what we might do with well-behaved compilers, relying on some restricted code patterns (e.g., memory accesses for jump table are simple enough). But anyway, I believe we can definitely do better within such a scope (for commonly-used compilers). |
Thank you for clarifying some of the things I did not understand before. I agree with you; I believe Retrowrite is not a breakthrough in static analysis; it is a simple, easy-to-hack rewriter that shows how we can rewrite COTS binaries in around 3k lines of python, without needing to use complex static analysis (because we restrict ourselves to COTS).
You are definitely not wrong, and I think both the paper and the README should be more specific and make this point more clear to avoid any kind of confusion about this. I don't think the claim "we symbolize PIC without heuristics" in the paper is inherently wrong, as the paper only talks about COTS binaries (so, well-behaved compilers); but I do believe that we should have explicitly said that all the techniques used would not work in the general case, and why. I like the way you put it: the approach is heuristical in nature, but it's a perfect fit for COTS binaries.
That's true; while I personally won't have time right now to fix this issue as I'm super busy with the aarch64 version of Retrowrite, but hopefully I will get to fix jump tables on x86 sooner or later |
Hi, thanks for your contribution and hard work! Retrowrite is amazing.
Actually, I find a small unsoundness issue in reassembly and want to set up some discussion here. It would be very appreciated if anyone can comment on this.
In short, my key insight is that: although we do not need to distinguish numerical numbers and references/labels in PIE binaries, we still need to distinguish numerical numbers and the label offsets.
I use the latest commit
9e2e633e9ab165681733f3255e648a62b22e6368
for reference.Case 1
The story begins when I got a program which behaves differently after reassembly-and-recompilation (the attached code is reduced for easy demonstration).
My basic setup is:
After that, we can execute
a.out
andb.out
to get the execution results.We can see, with the same input 1, a.out prints -184 but b.out prints -202.
Originally, I think the fault is caused by the plt warnings. But after some exploration, I found it is an unsoundness issue in theory. Specifically, let's check the reassembly file a.s
We can see, retrowrite misclassified the numerical elements in
const int t[2]
as the label offsets (e.g., .LC75d-.LC818). After compilation, these values are changed for sure.Then I go check the code.
retrowrite/librw/rw.py
Line 219 in 9e2e633
It seems that retrowrite uses heuristics to symbolize jump tables (which contains many label offsets). And in this case, the global
const int t[2]
satisfies the heuristics by chance, which confuses retrowrite. Unsoundness is still here and there, somehow.With more study, I think the problem can be summarized as:
Although we do not need to distinguish numerical numbers and references in PIE binaries, we still need to distinguish numerical numbers and the offsets between labels. I feel these label offsets are not in the symbol/relocation table.
The aforementioned case provides an example, where the element in jump table is the offset between labels (the target and jump base). That is why we got confused here.
Case 2
To further study the root cause, I hand-crafted a program with inline-assembly. Hope it can help.
Let's go through the code.
The first test-je pattern is to let retrowrite know there is a basic block starting at
A
.The mov instruction is the key, which loads the offset between labels B and A into rbx.
The following push instruction stores the offset into memory, and the lea instruction loads the address of label A into r8.
Later, we pop the offset into r9. Note that, here we use a simple push-pop pattern to simulate the complex behaviors (including the aliasing problems) in real-world binary.
The add instruction adds r8 and r9, which denotes A + B - A = B.
Then, an indirect jump leads the control flow to B.
B is a simple exit(0).
My basic setup is:
Let's first check the reassembly file
We can see retrowrite left the label offset (B-A) as a constant number 8.
After instrumentation, the offset has changed, but the constant is still left here, which breaks the recompiled binary (the indirect jump target becomes invalid).
The solution is to infer the label offset (B - A). However, the traditional challenge, which is caused by sophisticated memory behaviors, is still there.
More
I have attached the above files. The directory structure is:
It seems that an Usenix paper also mentions the small unsoundness issue in Section 7.1. It would be very appreciated if anyone can share some thinkings here. And also, I hope our discussion can help the development of retrowrite, which is, again, a great work for us to follow.
Thanks!
The text was updated successfully, but these errors were encountered: