Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn or fix whitespace in variable names #58

Open
ZetaTwo opened this issue Apr 6, 2020 · 2 comments
Open

Warn or fix whitespace in variable names #58

ZetaTwo opened this issue Apr 6, 2020 · 2 comments

Comments

@ZetaTwo
Copy link

ZetaTwo commented Apr 6, 2020

Continuing the discussion from here: rocky/python-uncompyle6#312

@rocky directed me to post here instead.

It's possible to easily fool a lot of decompilers by renaming variables to names containing whitespace. For example, running this little script:

$ git clone https://github.com/ZetaTwo/python-obfuscator
$ cd python-obfuscator
$ ./test.sh

Will demonstrate how uncompyle6 can be tricked into decompiling the bytecode into valid code that does not reflect the actual bytecode.

My proposal is to add a small check that will at least warn about the presence of variables with invalid names and possinly even replace the invalid characters with for example "_".

I'm willing to make a PR with these changes but I would first just like to hear if I'm in the right place and if you have opinions on where/how this should be implemented?

@rocky
Copy link
Owner

rocky commented Apr 6, 2020

Thanks for undertaking this. You are in the right place. We need more people helping out.

Some thoughts. First, there is something that needs to be done on the API side which is what the disassemblers would use. This relevant code is in xdis/main.py. And then adding a CLI option on for the standanalone pydisasm command. That is in xdis/bin/pydisasm.py.

There might be an option to warn but not fix and another setting to fix.

However blindlly replacing invalid characters with underscore probably won't be satisfying, because there could be another variable with that name. Since this kind of mapping is many to one (two distinct invalid characters map to the same character, underscore) you are even likely to get duplicates that way. So checking for existence of the variable before creating a name is desirable. Adding increasing number suffixes (and checking after the suffix is added) is one way to ensure uniqueness.

For completeness, I'll note that as far as assembly goes, invalid names is okay, just as it is in bytecode, because the interpreter doesn't use those names, just the index into the variable name array. But when you go say back to the source code, then the fact that two names are the same when they were different will cause a problem either in understanding the problem or running it from source again.

If you run into trouble or have questions, just ask.

@ZetaTwo
Copy link
Author

ZetaTwo commented Apr 11, 2020

Ok, I submitted a pull request and some questions regarding the fix. Please have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants