[RFC] Triton IR Printer Decorator #2030
henryg-d-Matrix
started this conversation in
Ideas
Replies: 1 comment 10 replies
-
Why do you need a decorator instead of explicitly calling the APIs? For example:
https://github.com/openai/triton/blob/main/python/examples/copy_strided.py#L7-L18 |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TL;DR: decorator like
jit
except it prints Triton IR when kernel is called.I propose a decorator like
jit
except the compiled Triton IR is printed when the decorated python function / kernel is called. This would mainly be used for testing and development purposes to simplify getting Triton IR code organically compiled from Triton source code. This would prevent having to handwrite kernels in Triton IR or having to dig through the cache directory trying to find the right cached IR file. It would also enable natural generation of Triton IR on simple development machines without GPUs or perhaps without even CUDA. Hacking the Triton compiler to achieve this means is obviously possible, but a dedicated utility for this purpose would prevent unnecessary disturbances to the actual Triton compiler stack meant for kernel execution.Though not meant for frontend integration onto other third-party custom compiler stacks, it can definitely be expanded for that purpose. This printer decorator utility can also be expanded to print other forms of IR such as Triton GPU IR, or the converted LinAlg output of triton-to-linalg.
My current work on this has the printer decorator contained in a single file and no changes to other parts of the Triton Python frontend. Another similar proposal would be to add another additional parameter to the
jit
compiler for such debug compile outputs of intermediate IRs. The latter method has less separation and less room for future expansions.Hopefully this utility can be useful despite its simplicity.
Related discussions:
A verbose mode printing commands used
Is it possible to get the intermediate ttir and ttgir on a non-CUDA machine?
How do I view Triton IR and LLVM IR generated by the triton compiler?
Beta Was this translation helpful? Give feedback.
All reactions