-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ampere GEMM example using Cute and CUTLASS 3.x #1604
base: main
Are you sure you want to change the base?
Conversation
this looks like a copy paste of the version we already have in the test dir. may I ask why you want to have this in the examples dir? we already have example 59 which does this |
Hi @thakkarV, Thank you for your feedback. This example is meant to complement the version in the test directory and provide more visibility for users who may not explore that directory. It serves as an entry-level example, similar to example 14 for Ampere with Cutlass 2.x and examples 48 and 49 for Hopper on Cutlass 3.x. In addition, it allows users to easily experiment with different GEMM configurations, enabling them to tune GEMM to their needs. I could not find an easy way to do that without implementing this example. Example 59 is much more complicated than what I'm adding here. I don't think it targets the same users or use case. I think it is a valuable contribution that we have found useful for us and could be for others. |
@aacostadiaz do you no longer want this merged? |
Hi @thakkarV, yes. I still would like this to get merged. I accidentally closed it, sorry. |
This PR has been labeled |
This PR has been labeled |
This pull request adds a GEMM example for the NVIDIA Ampere architecture, using Cute and CUTLASS 3.x. This example demonstrates how to create a GEMM kernel using the Cute components defined for SM80 and the
Collective MMA
andCollective Epilogue
APIs provided in CUTLASS 3.x.