Lecture 14 weird result with TRITON_INTERPRET=1 #51

noklam · 2025-04-16T12:08:46Z

The matmul result is obviously wrong. Then I execute the next cell with (512,512), which produce a correct result surprisingly.

I suspect there are some bug with the index or missing guard clause. See another example with (16,16), which match the batch size also passed.

p.s. I do not have a GPU and running this on a Mac so everything is run in CPU mode and TRITON_INTERPRET=1

The text was updated successfully, but these errors were encountered:

noklam · 2025-04-16T12:18:35Z

Reading the code again, should mask be applied in tl.load so that the dot product only calculate element within the bound?

Adding the mask will pass all 3 cases.

    # Get 1d mask for 
    for _ in range(0, k, bk):
        mask_a = get_2d_mask(rm, rn, m, k) # add mask
        mask_b = get_2d_mask(rm, rn, k, n)  # add mask
            
        a = tl.load(offs_a, mask_a)
        b = tl.load(offs_b, mask_b)
        acc += tl.dot(a, b, allow_tf32=False) # matmul in block ; Weirdness: allow_tf32 must be set to False for older GPUs, otherwise won't compile
        
        # print_if(f"pid: {pid_m, pid_n} |\na: {a} | \nb: {b}", "")
        
        # increase offets, so next iteration loads next chunks
        offs_a += bk * stride_ak
        offs_b += bk * stride_bk

noklam linked a pull request Apr 16, 2025 that will close this issue

add mask to matmul kernel in lecture 14 #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lecture 14 weird result with TRITON_INTERPRET=1 #51

Lecture 14 weird result with TRITON_INTERPRET=1 #51

noklam commented Apr 16, 2025

noklam commented Apr 16, 2025 •

edited

Loading

Uh oh!

Lecture 14 weird result with TRITON_INTERPRET=1 #51

Lecture 14 weird result with TRITON_INTERPRET=1 #51

Comments

noklam commented Apr 16, 2025

noklam commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noklam commented Apr 16, 2025 •

edited

Loading