Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local array creation #122

Open
JonathanDLTran opened this issue Feb 25, 2022 · 3 comments
Open

Local array creation #122

JonathanDLTran opened this issue Feb 25, 2022 · 3 comments

Comments

@JonathanDLTran
Copy link
Collaborator

When C programs have local arrays created within a function, LLVM sometimes generates these arrays by first allocating space, then doing a memset into the allocated space. The current method of backtracking from a store instruction during the translation from LLVM to Egg terms fails to track the memset instructions, because memset instructions do not write to a register, but instead use the memory location as an argument. Sometimes memcopy instructions are also generated as well.

Here is a minimal test case that generates memset, without memcopy instructions:

void test(float A[SIZE]) {
    for (int i = 0; i < SIZE; i++) {
        float x[SIZE] = {0.0f};
        for (int j = 0; j < SIZE; j++) {
            x[j] = 1.0f;
        }
        float sum = 0.0f;
        for (int j = 0; j < SIZE; j++) {
            sum += x[j];
        }
        A[i] = sum;
    }
}

and the clang code it generates, after several optimizations were run:

; Function Attrs: noinline nounwind ssp uwtable
define void @test(float* %0) #0 {
.preheader:
  %1 = alloca i64, align 8
  %tmpcast = bitcast i64* %1 to [2 x float]*
  %2 = bitcast i64* %1 to i8*
  %3 = bitcast i64* %1 to float*
  store i64 0, i64* %1, align 8
  call void @memset_pattern16(i8* nonnull %2, i8* bitcast ([4 x float]* @.memset_pattern to i8*), i64 8) #4
  %4 = load float, float* %3, align 8
  %5 = fadd float %4, 0.000000e+00
  %6 = getelementptr inbounds [2 x float], [2 x float]* %tmpcast, i64 0, i64 1
  %7 = load float, float* %6, align 4
  %8 = fadd float %5, %7
  store float %8, float* %0, align 4
  store i64 0, i64* %1, align 8
  call void @memset_pattern16(i8* nonnull %2, i8* bitcast ([4 x float]* @.memset_pattern to i8*), i64 8) #4
  %9 = load float, float* %3, align 8
  %10 = fadd float %9, 0.000000e+00
  %11 = load float, float* %6, align 4
  %12 = fadd float %10, %11
  %13 = getelementptr inbounds float, float* %0, i64 1
  store float %12, float* %13, align 4
  ret void
}

The Diospyros pass walks back from the store after the instruction definition %13, and recursively looks at arguments of instructions, beginning with the %12 and %13 referenced in the store. Because no instruction references memset (memset has no destination register), the current pass implementation fails to pick up memset.

To fix this issue, I am planning to treat any memset/memcopy instruction like a store, because it has the same basic function as a store. Where the Diospyros pass walks back from stores, it should also walk back from stores/memset/memcopy instructions, and translate recursively backwards into an Egg term. This change will allow Egg expressions to contain memset nodes in the correct position.

@sampsyo
Copy link
Contributor

sampsyo commented Feb 25, 2022

Ah, nice work boiling this down to a small example! This is incredibly helpful to look at!

It looks like the optimizations run ahead of Diospyros are already smart enough to convert that for loops that does initialization into a memset, huh? That's pretty cool.

To confirm: is SIZE equal to 2 here? That would explain this:

%1 = alloca i64, align 8
%tmpcast = bitcast i64* %1 to [2 x float]*

That is, LLVM is using a single i64 to store two floats for x.

Anyway, treating memset & memcpy calls like stores seem like exactly the right thing to do!

@JonathanDLTran
Copy link
Collaborator Author

Yep, to clarify, SIZE was 2 for this example, and that does help clarify why the i64 is used! I will go ahead and translate the memset and memcpy instructions.

@JonathanDLTran
Copy link
Collaborator Author

The original problem was mostly fixed, but another similar problem relating in the load/store movement pass is being fixed, where loads and stores cannot be moved around memsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants