Skip to content

Certain operations lack consideration for data types #166

Closed
@DrXiao

Description

@DrXiao

[ Edit: 2025/01/21 19:15 ]
Recently, I used the following tricky code to test shecc:

/* test.c */
int main() {
    char a = 0x11, b = 0x22, c = 0x33, d = 0x44;
    char *aa = &a, *bb = &b, *cc = &c, *dd = &d;
    int *aaa = &a;
    char arr[4];
    arr[0] = 0x11;
    arr[1] = 0x22;
    arr[2] = 0x33;
    arr[3] = 0x44;

    /* 1 */
    printf("== 1 ==\n");
    printf("%d %d %d %d\n", a, b, c, d);
    printf("%x %x %x %x\n", aa, bb, cc, dd);
    printf("%d %d %d\n", bb - aa, cc - bb, dd - cc);
    printf("%x %x\n\n", *aaa, aaa[0]);

    /* 2 */
    aaa = arr;

    printf("== 2 ==\n");
    for (int i = 0; i < 4; i++)
	    printf("arr[%d] = %x\n", i, arr[i]);

    printf("aaa = %x, arr = %x\n", aaa, arr);
    printf("*aaa = %x, aaa[0] = %x\n\n", *aaa aaa[0]);
    
    /* 3 */
    printf("== 3 ==\n");
    a += 6000;
    b += 400;
    c -= 400;
    d -= 6000;
    aaa = &a;
    printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);

    return 0;
}

When the code is compiled using GCC with no optimization, all outputs match the expected results:

$ gcc -o test test.c
$ ./test
== 1 ==
17 34 51 68
1759c720 1759c721 1759c722 1759c723
1 1 1
44332211 44332211

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 1759c754, arr = 1759c754
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = -127, b = -78, c = -93, d = -44

However, if using shecc to compile the code, some outputs seem to be wrong:

$ qemu-arm out/shecc-stage2.elf -o test test.c
$ qemu-arm test
== 1 ==
17 34 51 68
407ffd8c 407ffd90 407ffd94 407ffd98
4 4 4
11 11

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 407ffda8, arr = 407ffda8
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = 6017, b = 434, c = -349, d = -5932

By the code and the comment in test.c, it is obvious that the code is divided to 3 snippets. Next, I use these snippets to explain their behaviors and the bug what I found.


snippet 1

After introducing the pull request (#171), every stack allocation increment had been ensured to be properly aligned. Due to stack alignment requirements, each single character variable occupies 4 bytes, despite a character's actual size being 1 byte.

Thus, the following results are correct.

  1. the pointer arithmetic results like bb - aa, cc - bb and dd - cc are 4.
  2. Because aaa points to the address of a, the result of the dereference through aaa is equaivalent to the value of a.

snippet 2

- printf("*aaa = %x, aaa[0] = %x\n\n", *arr, aaa[0]);    /* 2024/12/15 22:56 */
+ printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);    /* 2025/01/21 19:15 */

(Previously, I made a typo so that I misunderstood pointer dereferencing also had an error. After fixing it, the result is correct.)


snippet 3 - (bug)

Because the variables (a ~ d) are signed characters, their results should be overflowed after executing additions and subtractions.

The correct results should be a = -127, b = -78, c = -93 and d = -44, but the results of the executable compiled by shecc produces error outputs.

Conclusion

According to the result of the snippet 3, the current backends may lack consideration of the data type when generating load/store instructions.

For example, the Arm backend may generates a ldr instruction for loading a character (1 byte), regardless of the data type. But, For character variables, it must generate ldrsb instructions instead.

Since these problems also occur in the RISC-V backend, both the Arm and RISC-V backends require improvements to generate data-type-specific instructions, such as ldrsb for signed characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions