Certain operations lack consideration for data types

[ Edit: 2025/01/21 19:15 ]
Recently, I used the following tricky code to test shecc:
```c
/* test.c */
int main() {
    char a = 0x11, b = 0x22, c = 0x33, d = 0x44;
    char *aa = &a, *bb = &b, *cc = &c, *dd = &d;
    int *aaa = &a;
    char arr[4];
    arr[0] = 0x11;
    arr[1] = 0x22;
    arr[2] = 0x33;
    arr[3] = 0x44;

    /* 1 */
    printf("== 1 ==\n");
    printf("%d %d %d %d\n", a, b, c, d);
    printf("%x %x %x %x\n", aa, bb, cc, dd);
    printf("%d %d %d\n", bb - aa, cc - bb, dd - cc);
    printf("%x %x\n\n", *aaa, aaa[0]);

    /* 2 */
    aaa = arr;

    printf("== 2 ==\n");
    for (int i = 0; i < 4; i++)
	    printf("arr[%d] = %x\n", i, arr[i]);

    printf("aaa = %x, arr = %x\n", aaa, arr);
    printf("*aaa = %x, aaa[0] = %x\n\n", *aaa aaa[0]);
    
    /* 3 */
    printf("== 3 ==\n");
    a += 6000;
    b += 400;
    c -= 400;
    d -= 6000;
    aaa = &a;
    printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);

    return 0;
}
```
When the code is compiled using GCC with no optimization, all outputs match the expected results:
```
$ gcc -o test test.c
$ ./test
== 1 ==
17 34 51 68
1759c720 1759c721 1759c722 1759c723
1 1 1
44332211 44332211

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 1759c754, arr = 1759c754
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = -127, b = -78, c = -93, d = -44
```
However, if using shecc to compile the code, some outputs seem to be wrong:
```
$ qemu-arm out/shecc-stage2.elf -o test test.c
$ qemu-arm test
== 1 ==
17 34 51 68
407ffd8c 407ffd90 407ffd94 407ffd98
4 4 4
11 11

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 407ffda8, arr = 407ffda8
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = 6017, b = 434, c = -349, d = -5932
```
By the code and the comment in `test.c`, it is obvious that the code is divided to 3 snippets. Next, I use these snippets to explain their behaviors and the bug what I found.

---
### snippet 1
After introducing the pull request (#171), **every stack allocation increment had been ensured to be properly aligned**. Due to stack alignment requirements, **each single character variable occupies 4 bytes**, despite a character's actual size being 1 byte.

Thus, the following results are correct.
1. the pointer arithmetic results like `bb - aa`, `cc - bb` and `dd - cc` are `4`.
2. Because `aaa` points to the address of `a`, the result of the dereference through `aaa` is equaivalent to the value of `a`.

---
### snippet 2
```diff
- printf("*aaa = %x, aaa[0] = %x\n\n", *arr, aaa[0]);    /* 2024/12/15 22:56 */
+ printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);    /* 2025/01/21 19:15 */
```
(Previously, I made a typo so that I misunderstood pointer dereferencing also had an error. After fixing it, the result is correct.)

---
### snippet 3 - **(bug)**
Because the variables (`a` ~ `d`) are signed characters, their results should be overflowed after executing additions and subtractions.

The correct results should be `a = -127`, `b = -78`, `c = -93` and  `d = -44`, but the results of the executable compiled by shecc produces error outputs.

## Conclusion
According to the result of the snippet 3, the current backends may lack consideration of the data type when generating load/store instructions.

For example, the Arm backend may generates a `ldr` instruction for loading a character (1 byte), regardless of the data type. But, For character variables, it must generate `ldrsb` instructions instead.

Since these problems also occur in the RISC-V backend, both the Arm and RISC-V backends require improvements to generate data-type-specific instructions, such as `ldrsb` for signed characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Certain operations lack consideration for data types #166

snippet 1

snippet 2

snippet 3 - (bug)

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Certain operations lack consideration for data types #166

Description

snippet 1

snippet 2

snippet 3 - (bug)

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions