Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect translation of BEQ? #11

Open
ForceBru opened this issue May 19, 2022 · 1 comment
Open

Incorrect translation of BEQ? #11

ForceBru opened this issue May 19, 2022 · 1 comment

Comments

@ForceBru
Copy link

Assembly code

addi s0 x0 10
addi s1 x0 10

loop:
	addi s1 x0 -1 ; I know it's the same as s1 = -1
	beq s1 x0 out
	beq x0 x0 loop

out:
	addi s1 s0 -32

Assembling it

from pathlib import Path
from riscv_assembler.convert import AssemblyConverter

BASE_PATH = Path("code")
conv = AssemblyConverter(output_type="bt")
conv.convert(str(BASE_PATH / "loop.s"))

Instructions in binary

This is the .txt file produced by convert:

00000000101000000000010000010011
00000000101000000000010010010011
11111111111100000000010010010011
00000000000001001000010001100011
01111100000000000000101011100011
11111110000001000000010010010011

Issue

The beq x0 x0 loop instruction seems to be encoded incorrectly. According to chapter 19 "RV32/64G Instruction Set Listings" of the spec, this is a B-type instruction whose encoding is as follows:

0 111110 00000 00000 000 1010 1 1100011
^ ^^^^^^ ^^^^^ ^^^^^     ^^^^ ^
| |      |     rs1       |    |
| |      rs2             |    imm[11]
| imm[10:5]              imm[4:1]
imm[12]

Thus, the immediate bytes are:

  • imm[4:1] = 1010
  • imm[10:5] = 111110
  • imm[11] = 1
  • imm[12] = 0

According to section 2.3 "Immediate Encoding Variants", imm[12] is the sign of the immediate, so since imm[12] = 0, we have a positive offset, so beq x0 x0 loop will jump forward, even though it's supposed to jump backward, back to the loop label.

The immediate is:

0000 1 111110 1010 0 = 4052

So we'll jump 4052 bytes forward???

Furthermore, RARS provides a different encoding for this instruction:

v--- different leading bit
1 111111 00000 00000 000 1100 1 1100011 <- RARS
0 111110 00000 00000 000 1010 1 1100011 <- this assembler

RARS's immediate is 1111 1 111111 1100 0 = -8, so that's a jump 8 / 2 = 4 bytes back, so 2 instructions back, which leads to the loop label, which makes sense.

@ForceBru
Copy link
Author

If I feed the correct offset, -8, to SB_type here:

mod_imm = (int(imm) - ((int(imm) >> 12) << 12)) >> 6 # imm[12]
mod_imm += (int(imm) - ((int(imm) >> 11) >> 11)) >> 5 # imm[12|10:5]

...mod_imm seems fine (even though RARS says it should be all ones, but whatever):

>>> imm = -8
>>> mod_imm = (int(imm) - ((int(imm) >> 12) << 12)) >> 6 # imm[12]
>>> mod_imm
63
>>> bin(mod_imm)
'0b111111'
>>> mod_imm += (int(imm) - ((int(imm) >> 11) >> 11)) >> 5 # imm[12|10:5]
>>> bin(mod_imm)
'0b111110'

But then this call to self.__binary:

self.__binary(mod_imm,7),

...returns '0111110', with a leading zero bit, which seems to be the source of the leading zero bit in the resulting encoding.

@gephaistos gephaistos mentioned this issue Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant