Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

fvrmatteo · 2022-10-23T10:41:20Z

Hi!

I wanted to take a stab at adding support for the AArch64 STP_Q_LDSTPAIR_OFF semantics.

I prepared a preliminary patch, although there are some things that are not clear to me at the moment:

The old commented out code for the instruction was using the MV128W type, that I interpreted as "memory vector 128 bits write" type, although the actual semantics is writing a pair of 128 bits registers, hence why I added a MV256W type instead. Is this fine?
The generated code will result in a sequence of four 64 bits load instructions to read the pair of SIMD registers and then a sequence of four 64 bits store instructions to write the values to memory. There may be reasons to keep the 64 bits load/store instructions or to have instead two 128 bits load instructions and two 128 bits store instructions. What is the general rule in these cases?

… involving SIMD registers.

CLAassistant · 2022-10-23T10:41:26Z

All committers have signed the CLA.

pgoodman · 2022-10-23T17:49:32Z

I think the idea of the 64 bit loads/stores was that I didn't have a reliable uint18_t type years ago, and I didn't want to risk it turning into an intrinsic function that took the address of a 128-bit value, and thus took the address of a state structure register, indirectly leading to state escape that could hinder optimization.

You could prototype an actual int18_t (or beyond) memory access intrinsic and see if you get better code.

Similarly, a very early decision in remill was that I wanted to make the code as easy to run in klee or custom tools, so I didn't want to have to support the myriad vector types, and hence used arrays for everything. This might be worth revisiting, e.g. using __attribute__((vector_size(N))).

fvrmatteo · 2022-10-24T10:41:43Z

I think the 64 bits load/store instructions are actually fine for now because they match the logic used by the rest of the semantics (e.g. LDP SIMD semantics) and are easier to work with KLEE/custom tools.

Although the AArch64 manual is specifically showing that this instruction would be executing two 128 bits load/store instructions, hence why I had a doubt. This would need to be a long process thats makes sure all the instructions (that work on SIMD registers and memory) use the properly sized memory accesses.

pgoodman · 2022-10-24T15:22:34Z

@fvrmatteo Can you investigate re-enabling this test (and possibly see if it's sane)?

https://github.com/lifting-bits/remill/blob/master/tests/AArch64/DATAXFER/STP_n_LDSTPAIR_OFF.S#L67

fvrmatteo · 2022-10-25T18:08:29Z

The test itself looks fine and I re-enabled it, although I cannot really run it on M1.

pgoodman · 2022-10-26T17:17:03Z

@fvrmatteo can you investigate the hacks that @tetsuo-cpp tried, and see if it's possible to get M1 support by removing the call frame information dwarf things, i.e. .cfi_startproc et al., from the test macros?

@page

…g 'brk #0x0' as AArch64 breakpoint. Using @page and @PAGEOFF.

fvrmatteo · 2022-11-06T15:39:13Z

@pgoodman I did some changes to the AArch64 tests, but I'm a bit wandering in the dark because I don't know how GTest works. The changes I did in the last commit enable the compilation of the AArch64 tests on MacOS ARM64, although when I run them manually I get the following error:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from GoogleTestVerification
[ RUN      ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest>
/Users/matteo/Documents/remill/tests/AArch64/Run.cpp:762: Failure
Parameterized test suite InstrTest is defined via TEST_P, but never instantiated. None of the test cases will run. Either no INSTANTIATE_TEST_SUITE_P is provided or the only ones provided expand to nothing.

Ideally, TEST_P definitions should only ever be included as part of binaries that intend to use them. (As opposed to, for example, being placed in a library that may be linked in to get other utilities.)

To suppress this error for this test suite, insert the following line (in a non-header) in the namespace it is defined in:

GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(InstrTest);
[  FAILED  ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest> (0 ms)
[----------] 1 test from GoogleTestVerification (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] GoogleTestVerification.UninstantiatedParameterizedTestSuite<InstrTest>

It is unexpected because in the Run.cpp file I can see the call to INSTANTIATE_TEST_SUITE_P for InstrTest, so I'm not sure why it isn't detected.

I also turned -O1 into -O0 temporarily because compiling Remill with LLVM-14 is going to fail the compilation due to the LoopLoadElimination pass (and possibly others) not properly handling the opaque pointers; these issues are solved in LLVM-15, so we'll be able to use -O1 again.

pgoodman · 2022-11-07T01:30:06Z

@fvrmatteo Does Clang's assembler support the adrl pseudo-op? Or maybe the @pageoff macro can always include the :lo12: in it?

Otherwise, I am not sure why that comes up with the tests. I will try to look into it.

fvrmatteo · 2022-11-07T12:29:58Z

@pgoodman As far as I see and from my tests, the adrl pseudo operation is not supported by Clang and some migration guides are suggesting to define a macro for it when moving from GAS syntax for example. I included the :lo12: into the SYMBOL_PAGEOFF macro as requested.

Please let me know if you figure out the issue with GTest, in the meantime I'll try figuring out the same.

tetsuo-cpp · 2022-11-15T12:17:00Z

Please let me know if you figure out the issue with GTest, in the meantime I'll try figuring out the same.

Hey @fvrmatteo, I'm taking a look. I'll let you know what I find.

tetsuo-cpp · 2022-11-16T08:21:08Z

Ok, I have an idea of what's going on.

The reason this is happening is not because there is no INSTANTIATE_TEST_SUITE_P macro (there is), it's because gTests is empty. This data ultimately comes from __aarch64_test_table defined in Tests.S.

Curiously, compilation succeeds without an issue however, if you dump the contents of the Tests.S.o object file, it shows that the __arch64_test_table section is empty and that __aarch64_test_table_begin is pointing at the same spot as __aarch64_test_table_end.

I believe this is because the pre-processor macros that we use to populate the test table aren't portable across assemblers: more specifically, the GAS syntax uses ; as a statement delimiter whereas the Clang ARM assembler uses it as the beginning of a comment. So effectively, the assembler on my M1 was treating everything after the first statement of this block (just a .section directive) as part of a comment. That's why there's no error raised by the assembler but the test table is unpopulated. If I dump the pre-processed assembly file to disk and then manually replace the semicolons with newlines, I start to see the test table getting populated.

At the moment, I'm trying to figure out whether there is a character that isn't a newline that can be used as a statement delimiter in the Clang ARM assembler syntax or whether there is some .syntax directive I can use to get around this.

pgoodman · 2022-11-16T17:09:04Z

@tetsuo-cpp In the past, I've copied what DynamoRIO does and used @N@ as a delimiter, then ran a post-processing script to fix them up 🤮 https://github.com/Granary/granary/blob/master/scripts/post_process_asm.py#L12

pgoodman · 2022-11-16T17:09:57Z

E.g. https://github.com/Granary/granary/blob/f9ac2b6af0864f420b93917dc21a38041905b3aa/clients/watchpoints/clients/bounds_checker/bounds_checkers.asm#L58

tetsuo-cpp · 2022-11-17T09:55:38Z

@tetsuo-cpp In the past, I've copied what DynamoRIO does and used @N@ as a delimiter, then ran a post-processing script to fix them up 🤮 https://github.com/Granary/granary/blob/master/scripts/post_process_asm.py#L12

Haha it's ugly... but it works. Done in ae8f8c9.

Now I'm seeing a few test failures but at least the tests are running. Weird thing is that the number of tests failing is different each time so there's some flakiness going on. I don't have time to look at tonight though.

tetsuo-cpp · 2022-11-20T08:14:31Z

Had a quick look at this today but still not 100% sure. From what I can see, there are a two different types of mismatches that may or may not be related:

The first is a mismatch on system registers. Flags like IXC, OFC and UFC are seeing mismatches. This happens across multiple test cases.

[ RUN      ] GeneralInstrTest/InstrTest.SemanticsMatchNative/fadd_s_floatdp2_2
E20221120 18:27:52.705013 8271463 Run.cpp:686] States did not match for fadd_s_floatdp2_2 with X0=3fffffff, X1=40000000 and N=0, Z=0, C=0, V=0
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:687: Failure
Value of: !"Lifted and native states did not match."
  Actual: false
Expected: true
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:723: Failure
Expected equality of these values:
  lifted_state->sr.ixc
    Which is: '\0'
  native_state->sr.ixc
    Which is: '\x1' (1)
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:725: Failure
Expected equality of these values:
  lifted_state->sr.ufc
    Which is: '\x1' (1)
  native_state->sr.ufc
    Which is: '\0'
E20221120 18:27:52.705075 8271463 Run.cpp:738] Bytes at offset 1145 are different
E20221120 18:27:52.705098 8271463 Run.cpp:738] Bytes at offset 1149 are different
E20221120 18:27:52.705149 8271463 Run.cpp:686] States did not match for fadd_s_floatdp2_2 with X0=3fffffff, X1=40000000 and N=0, Z=0, C=0, V=1

The other is just an inequality when comparing states byte for byte. So it's not clear what part of the state is different unless I sit down and count the size of each struct member.

[ RUN      ] GeneralInstrTest/InstrTest.SemanticsMatchNative/add_w9_w0_fff_1
E20221120 18:27:51.766563 8271463 Run.cpp:686] States did not match for add_w9_w0_fff_1 with X0=0 and N=0, Z=0, C=0, V=0
/Users/tetsuo/Code/remill/tests/AArch64/Run.cpp:687: Failure
Value of: !"Lifted and native states did not match."
  Actual: false
Expected: true
E20221120 18:27:51.766611 8271463 Run.cpp:738] Bytes at offset 1112 are different
[  FAILED  ] GeneralInstrTest/InstrTest.SemanticsMatchNative/add_w9_w0_fff_1, where GetParam() = 0x104d54080 (3 ms)

The other part that's stumping me is the fact that the number of test failures isn't consistent. As far as I can tell, these tests should be deterministic but I haven't looked at every LOC so I don't know for sure. @pgoodman, do you know of any reason why the tests could be failing non-deterministically?

tetsuo-cpp · 2023-02-07T10:41:39Z

@pgoodman I'll take a look at this again when I get the chance. Did you have any ideas on why we might have non-determinism in these tests?

pgoodman · 2023-02-07T14:43:52Z

These look like floating point stuff? FPU stuff is typically a bit sketchy and might behave differently in different environments / rounding mode / etc. These one-byte differences look like fpu conditions, and so it's probably that the way we're computing the conditions is not quite right, or it appears right, but the actual non-determinism is due to the bitcode/machine code not being exactly like we need it to be. The way to track flags of fpu operations ends up being pretty brittle: it's something like:

1) reset flags
2) operation
3) read flags

And we rely on these being in-order, with no other stuff sneaking in. But maybe in the code there is:

1) reset flags (inst 1)
2) operation (inst 1)
3) read flags (inst 1)
4) reset flags (inst 2)
5) operation (inst 2)
6) read flags (inst 2)

And then the compiler (un)helpfully reorders as:

1) reset flags (inst 1)
2) operation (inst 1)
5) operation (inst 2)
3) read flags (inst 1)
4) reset flags (inst 2)
6) read flags (inst 2)

Adding new type 'MV256W'. Updating and enabling AArch64 STP semantics…

6a05abb

… involving SIMD registers.

Enabling SIMD STP test.

87c75b3

Enabling AArch64 tests when an 'arm64 macos' system is detected. Usin…

ada33e2

…g 'brk #0x0' as AArch64 breakpoint. Using @page and @PAGEOFF.

Merge branch 'lifting-bits:master' into aarch64-simd-stp

efb36d6

Including :lo12: into the SYMBOL_PAGEOFF macro.

4562eb4

tetsuo-cpp added 2 commits November 17, 2022 19:19

Fix missing include

c029caa

Add post-processing script to delimit statements with newlines

ae8f8c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

fvrmatteo commented Oct 23, 2022

CLAassistant commented Oct 23, 2022 •

edited

Loading

pgoodman commented Oct 23, 2022

fvrmatteo commented Oct 24, 2022

pgoodman commented Oct 24, 2022

fvrmatteo commented Oct 25, 2022

pgoodman commented Oct 26, 2022

fvrmatteo commented Nov 6, 2022 •

edited

Loading

pgoodman commented Nov 7, 2022

fvrmatteo commented Nov 7, 2022

tetsuo-cpp commented Nov 15, 2022

tetsuo-cpp commented Nov 16, 2022

pgoodman commented Nov 16, 2022

pgoodman commented Nov 16, 2022

tetsuo-cpp commented Nov 17, 2022

tetsuo-cpp commented Nov 20, 2022 •

edited

Loading

tetsuo-cpp commented Feb 7, 2023

pgoodman commented Feb 7, 2023 •

edited

Loading

Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

Are you sure you want to change the base?

Adding new type 'MV256W' and enabling AArch64 STP SIMD semantics. #636

Conversation

fvrmatteo commented Oct 23, 2022

CLAassistant commented Oct 23, 2022 • edited Loading

pgoodman commented Oct 23, 2022

fvrmatteo commented Oct 24, 2022

pgoodman commented Oct 24, 2022

fvrmatteo commented Oct 25, 2022

pgoodman commented Oct 26, 2022

fvrmatteo commented Nov 6, 2022 • edited Loading

pgoodman commented Nov 7, 2022

fvrmatteo commented Nov 7, 2022

tetsuo-cpp commented Nov 15, 2022

tetsuo-cpp commented Nov 16, 2022

pgoodman commented Nov 16, 2022

pgoodman commented Nov 16, 2022

tetsuo-cpp commented Nov 17, 2022

tetsuo-cpp commented Nov 20, 2022 • edited Loading

tetsuo-cpp commented Feb 7, 2023

pgoodman commented Feb 7, 2023 • edited Loading

CLAassistant commented Oct 23, 2022 •

edited

Loading

fvrmatteo commented Nov 6, 2022 •

edited

Loading

tetsuo-cpp commented Nov 20, 2022 •

edited

Loading

pgoodman commented Feb 7, 2023 •

edited

Loading