Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fvrmatteo
Copy link
Contributor

Hi!

I wanted to take a stab at adding support for the AArch64 STP_Q_LDSTPAIR_OFF semantics.

I prepared a preliminary patch, although there are some things that are not clear to me at the moment:

  1. The old commented out code for the instruction was using the MV128W type, that I interpreted as "memory vector 128 bits write" type, although the actual semantics is writing a pair of 128 bits registers, hence why I added a MV256W type instead. Is this fine?
  2. The generated code will result in a sequence of four 64 bits load instructions to read the pair of SIMD registers and then a sequence of four 64 bits store instructions to write the values to memory. There may be reasons to keep the 64 bits load/store instructions or to have instead two 128 bits load instructions and two 128 bits store instructions. What is the general rule in these cases?

@CLAassistant
Copy link

CLAassistant commented Oct 23, 2022

CLA assistant check
All committers have signed the CLA.

@pgoodman
Copy link
Collaborator

I think the idea of the 64 bit loads/stores was that I didn't have a reliable uint18_t type years ago, and I didn't want to risk it turning into an intrinsic function that took the address of a 128-bit value, and thus took the address of a state structure register, indirectly leading to state escape that could hinder optimization.

You could prototype an actual int18_t (or beyond) memory access intrinsic and see if you get better code.

Similarly, a very early decision in remill was that I wanted to make the code as easy to run in klee or custom tools, so I didn't want to have to support the myriad vector types, and hence used arrays for everything. This might be worth revisiting, e.g. using __attribute__((vector_size(N))).

@fvrmatteo
Copy link
Contributor Author

I think the 64 bits load/store instructions are actually fine for now because they match the logic used by the rest of the semantics (e.g. LDP SIMD semantics) and are easier to work with KLEE/custom tools.

Although the AArch64 manual is specifically showing that this instruction would be executing two 128 bits load/store instructions, hence why I had a doubt. This would need to be a long process thats makes sure all the instructions (that work on SIMD registers and memory) use the properly sized memory accesses.

@pgoodman
Copy link
Collaborator

@fvrmatteo Can you investigate re-enabling this test (and possibly see if it's sane)?

https://github.com/lifting-bits/remill/blob/master/tests/AArch64/DATAXFER/STP_n_LDSTPAIR_OFF.S#L67

@fvrmatteo
Copy link
Contributor Author

The test itself looks fine and I re-enabled it, although I cannot really run it on M1.

@pgoodman
Copy link
Collaborator

@fvrmatteo can you investigate the hacks that @tetsuo-cpp tried, and see if it's possible to get M1 support by removing the call frame information dwarf things, i.e. .cfi_startproc et al., from the test macros?

…g 'brk #0x0' as AArch64 breakpoint. Using @page and @PAGEOFF.
@fvrmatteo
Copy link
Contributor Author

fvrmatteo commented Nov 6, 2022

@pgoodman I did some changes to the AArch64 tests, but I'm a bit wandering in the dark because I don't know how GTest works. The changes I did in the last commit enable the compilation of the AArch64 tests on MacOS ARM64, although when I run them manually I get the following error:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from GoogleTestVerification
[ RUN      ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest>
/Users/matteo/Documents/remill/tests/AArch64/Run.cpp:762: Failure
Parameterized test suite InstrTest is defined via TEST_P, but never instantiated. None of the test cases will run. Either no INSTANTIATE_TEST_SUITE_P is provided or the only ones provided expand to nothing.

Ideally, TEST_P definitions should only ever be included as part of binaries that intend to use them. (As opposed to, for example, being placed in a library that may be linked in to get other utilities.)

To suppress this error for this test suite, insert the following line (in a non-header) in the namespace it is defined in:

GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(InstrTest);
[  FAILED  ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest> (0 ms)
[----------] 1 test from GoogleTestVerification (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest>

It is unexpected because in the Run.cpp file I can see the call to INSTANTIATE_TEST_SUITE_P for InstrTest, so I'm not sure why it isn't detected.

I also turned -O1 into -O0 temporarily because compiling Remill with LLVM-14 is going to fail the compilation due to the LoopLoadElimination pass (and possibly others) not properly handling the opaque pointers; these issues are solved in LLVM-15, so we'll be able to use -O1 again.

@pgoodman
Copy link
Collaborator

pgoodman commented Nov 7, 2022

@fvrmatteo Does Clang's assembler support the adrl pseudo-op? Or maybe the @pageoff macro can always include the :lo12: in it?

Otherwise, I am not sure why that comes up with the tests. I will try to look into it.

@fvrmatteo
Copy link
Contributor Author

@pgoodman As far as I see and from my tests, the adrl pseudo operation is not supported by Clang and some migration guides are suggesting to define a macro for it when moving from GAS syntax for example. I included the :lo12: into the SYMBOL_PAGEOFF macro as requested.

Please let me know if you figure out the issue with GTest, in the meantime I'll try figuring out the same.

@tetsuo-cpp
Copy link
Contributor

Please let me know if you figure out the issue with GTest, in the meantime I'll try figuring out the same.

Hey @fvrmatteo, I'm taking a look. I'll let you know what I find.

@tetsuo-cpp
Copy link
Contributor

Ok, I have an idea of what's going on.

The reason this is happening is not because there is no INSTANTIATE_TEST_SUITE_P macro (there is), it's because gTests is empty. This data ultimately comes from __aarch64_test_table defined in Tests.S.

Curiously, compilation succeeds without an issue however, if you dump the contents of the Tests.S.o object file, it shows that the __arch64_test_table section is empty and that __aarch64_test_table_begin is pointing at the same spot as __aarch64_test_table_end.

I believe this is because the pre-processor macros that we use to populate the test table aren't portable across assemblers: more specifically, the GAS syntax uses ; as a statement delimiter whereas the Clang ARM assembler uses it as the beginning of a comment. So effectively, the assembler on my M1 was treating everything after the first statement of this block (just a .section directive) as part of a comment. That's why there's no error raised by the assembler but the test table is unpopulated. If I dump the pre-processed assembly file to disk and then manually replace the semicolons with newlines, I start to see the test table getting populated.

At the moment, I'm trying to figure out whether there is a character that isn't a newline that can be used as a statement delimiter in the Clang ARM assembler syntax or whether there is some .syntax directive I can use to get around this.

@pgoodman
Copy link
Collaborator

@tetsuo-cpp In the past, I've copied what DynamoRIO does and used @N@ as a delimiter, then ran a post-processing script to fix them up 🤮 https://github.com/Granary/granary/blob/master/scripts/post_process_asm.py#L12

@tetsuo-cpp
Copy link
Contributor

@tetsuo-cpp In the past, I've copied what DynamoRIO does and used @N@ as a delimiter, then ran a post-processing script to fix them up 🤮 https://github.com/Granary/granary/blob/master/scripts/post_process_asm.py#L12

Haha it's ugly... but it works. Done in ae8f8c9.

Now I'm seeing a few test failures but at least the tests are running. Weird thing is that the number of tests failing is different each time so there's some flakiness going on. I don't have time to look at tonight though.

@tetsuo-cpp
Copy link
Contributor

tetsuo-cpp commented Nov 20, 2022

Had a quick look at this today but still not 100% sure. From what I can see, there are a two different types of mismatches that may or may not be related:

The first is a mismatch on system registers. Flags like IXC, OFC and UFC are seeing mismatches. This happens across multiple test cases.

[ RUN      ] GeneralInstrTest/InstrTest.SemanticsMatchNative/fadd_s_floatdp2_2
E20221120 18:27:52.705013 8271463 Run.cpp:686] States did not match for fadd_s_floatdp2_2 with X0=3fffffff, X1=40000000 and N=0, Z=0, C=0, V=0
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:687: Failure
Value of: !"Lifted and native states did not match."
  Actual: false
Expected: true
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:723: Failure
Expected equality of these values:
  lifted_state->sr.ixc
    Which is: '\0'
  native_state->sr.ixc
    Which is: '\x1' (1)
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:725: Failure
Expected equality of these values:
  lifted_state->sr.ufc
    Which is: '\x1' (1)
  native_state->sr.ufc
    Which is: '\0'
E20221120 18:27:52.705075 8271463 Run.cpp:738] Bytes at offset 1145 are different
E20221120 18:27:52.705098 8271463 Run.cpp:738] Bytes at offset 1149 are different
E20221120 18:27:52.705149 8271463 Run.cpp:686] States did not match for fadd_s_floatdp2_2 with X0=3fffffff, X1=40000000 and N=0, Z=0, C=0, V=1

The other is just an inequality when comparing states byte for byte. So it's not clear what part of the state is different unless I sit down and count the size of each struct member.

[ RUN      ] GeneralInstrTest/InstrTest.SemanticsMatchNative/add_w9_w0_fff_1
E20221120 18:27:51.766563 8271463 Run.cpp:686] States did not match for add_w9_w0_fff_1 with X0=0 and N=0, Z=0, C=0, V=0
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:687: Failure
Value of: !"Lifted and native states did not match."
  Actual: false
Expected: true
E20221120 18:27:51.766611 8271463 Run.cpp:738] Bytes at offset 1112 are different
[  FAILED  ] GeneralInstrTest/InstrTest.SemanticsMatchNative/add_w9_w0_fff_1, where GetParam() = 0x104d54080 (3 ms)

The other part that's stumping me is the fact that the number of test failures isn't consistent. As far as I can tell, these tests should be deterministic but I haven't looked at every LOC so I don't know for sure. @pgoodman, do you know of any reason why the tests could be failing non-deterministically?

@tetsuo-cpp
Copy link
Contributor

@pgoodman I'll take a look at this again when I get the chance. Did you have any ideas on why we might have non-determinism in these tests?

@pgoodman
Copy link
Collaborator

pgoodman commented Feb 7, 2023

These look like floating point stuff? FPU stuff is typically a bit sketchy and might behave differently in different environments / rounding mode / etc. These one-byte differences look like fpu conditions, and so it's probably that the way we're computing the conditions is not quite right, or it appears right, but the actual non-determinism is due to the bitcode/machine code not being exactly like we need it to be. The way to track flags of fpu operations ends up being pretty brittle: it's something like:

1) reset flags
2) operation
3) read flags

And we rely on these being in-order, with no other stuff sneaking in. But maybe in the code there is:

1) reset flags (inst 1)
2) operation (inst 1)
3) read flags (inst 1)
4) reset flags (inst 2)
5) operation (inst 2)
6) read flags (inst 2)

And then the compiler (un)helpfully reorders as:

1) reset flags (inst 1)
2) operation (inst 1)
5) operation (inst 2)
3) read flags (inst 1)
4) reset flags (inst 2)
6) read flags (inst 2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants