-
Notifications
You must be signed in to change notification settings - Fork 26
Fix parsing of integer literals with base prefix #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix parsing of integer literals with base prefix #106
Conversation
9452423
to
23f8ab4
Compare
After merging #107 the tests now pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.
I guess the simplest fix here would be to just replace int(x)
with int(x, 0)
. That should restore the existing behaviour. But it looks like you want to improve things further, which is great!
esp32_ulp/opcodes.py
Outdated
parts = "".join(parts) | ||
if not validate_expression(parts): | ||
raise ValueError('Unsupported expression: %s' % parts) | ||
return eval(parts) | ||
|
||
|
||
def parse_int(literal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this code base, but would it make sense to factor this function out into a separate file, so it can be reused in opcodes_s2.py
?
Similarly, could have a single unit test for this function in a separate testing file.
(Just a suggestion 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, also from a code (de-)duplication aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. I tend to like keeping structural changes and logic changes in different commits/PRs, to make each change more clear. So in this case I kept the duplication we already had and therefore added parse_int
twice, with a planned followup PR to reduce the duplication.
The existing duplication was there to make the various opcodes*
modules have the same interface to the outside world.
But I notice now, that parse_int
is not used anywhere else than inside the opcodes*
modules, so the function doesn't have to form part of the interface. I can simply have it once in the util
module like for example the existing split_tokens
function.
I will make that change, to have only one parse_int
implementation and therefore only 1 set of unit tests for the function. Thanks for challenging this.
(I might consider a future PR to see if other code can be de-duplicated.)
Hey @wnienhaus , tested with the ulp programs on my s2, and had no issues 👍 |
Yes, That said, since def parse_int(literal):
if len(literal) > 2:
prefix_start = 1 if literal[0] == '-' else 0 # skip over negative sign if present
if literal[prefix_start] == '0' and literal[prefix_start+1] in '123456789':
return int(literal, 8)
return int(literal, 0) and all tests still pass. So it's really just the octal case that's different (and theoretically we should disallow python style octal ( Now I am starting the overthink this:
I think I'll keep the current approach, as it's very explicit about what we support (including explicitly supporting python style octal). I'll just remove the comment about legacy octal format, because from the GNU as perspective, it's the currently valid and only possible octal format. (Happy to get feedback on my chosen approach) |
Ok. Fixes pushed. Will squash-merge this once approved. |
e4a7e33
to
f7dfddc
Compare
remove duplication
f7dfddc
to
5c84d08
Compare
MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the
int()
function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)This broke our parsing logic, which relied on the previous behaviour of the
int()
function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.Additionally, we never actually parsed octal in the format
0100
correctly - even before this PR; that number would have been interpreted as 100 rather than 64.So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a custom
parse_int()
function, using the base prefix in a string to determine the correct base to pass toint()
. The following are supported:The
parse_int
method also supports the negative prefix operator for all of the above cases.This change also ensures
.int
,.long
,.word
directives correctly handle the above mentioned formats. This fixes the issue described in #104.Note: GNU as does not actually accept the octal prefix
0o...
, but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.