You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 22, 2023. It is now read-only.
I've been working on bringing up BYOC infra in Relax, building on the work of @sunggg and the pattern matcher work from @ganler. The ultimate goal is to make relax.vm.build(mod, "cuda") just work without tuning and with reasonable out-of-the-box performance. Also it would be the first step toward performant dynamic-shape support.
Add pass to merge neighboring calls to functions compiled for the same external backend into one function (similar to MergeCompilerRegion in Relay, necessary for TRT)
I've been working on bringing up BYOC infra in Relax, building on the work of @sunggg and the pattern matcher work from @ganler. The ultimate goal is to make
relax.vm.build(mod, "cuda")
just work without tuning and with reasonable out-of-the-box performance. Also it would be the first step toward performant dynamic-shape support.My branch is here and currently I have minimal test cases for offloading a simple subgraph to DNNL and CUTLASS. I'm going to start sending pieces from it from today.
https://github.com/tlc-pack/relax/compare/relax...masahi:codegen-cutlass?expand=1
RunCodegen
pass to send all BYOC functions to the backend at once (rather than individually)MergeComposite
in Relay)Add pass to wrap and annotate the partitioned function for offloading(subsumed by [BYOC] Add pass to merge composite functions to offload large subgraphs #372)MergeCompilerRegion
in Relay, necessary for TRT)Future possibilities (time permitting)
@sunggg @YuchenJin @tqchen @junrushao
The text was updated successfully, but these errors were encountered: