-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: kernels #314
base: main
Are you sure you want to change the base?
WIP: kernels #314
Conversation
@safetestset "Linear Algebra" include("linear_algebra.jl") | ||
end | ||
#if REACTANT_TEST_GROUP == "all" || REACTANT_TEST_GROUP == "core" | ||
@safetestset "CUDA" include("cuda.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
@safetestset "CUDA" include("cuda.jl") | |
@safetestset "CUDA" include("cuda.jl") |
using Adapt | ||
|
||
function Adapt.adapt_storage(::CUDA.KernelAdaptor, xs::TracedRArray{T,N}) where {T,N} | ||
CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs)) | |
return CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs)) |
CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs)) | ||
end | ||
|
||
const _kernel_instances = Dict{Any, Any}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
const _kernel_instances = Dict{Any, Any}() | |
const _kernel_instances = Dict{Any,Any}() |
cache = CUDA.compiler_cache(cuda.context) | ||
source = CUDA.methodinstance(F, tt) | ||
config = CUDA.compiler_config(cuda.device; kwargs...)::CUDA.CUDACompilerConfig | ||
fun = CUDA.GPUCompiler.cached_compilation(cache, source, config, CUDA.compile, CUDA.link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
fun = CUDA.GPUCompiler.cached_compilation(cache, source, config, CUDA.compile, CUDA.link) | |
fun = CUDA.GPUCompiler.cached_compilation( | |
cache, source, config, CUDA.compile, CUDA.link | |
) |
@show fun | ||
@show fun.mod | ||
# create a callable object that captures the function instance. we don't need to think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
@show fun | |
@show fun.mod | |
# create a callable object that captures the function instance. we don't need to think | |
@show fun | |
@show fun.mod | |
# create a callable object that captures the function instance. we don't need to think |
mapany, | ||
MethodResultPure | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
|
||
arginfo2 = ArgInfo( | ||
if fargs isa Nothing | ||
nothing | ||
else | ||
[:($(recufunction)), fargs[2:end]...] | ||
end, | ||
[Core.Const(recufunction), argtypes[2:end]...], | ||
) | ||
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
arginfo2 = ArgInfo( | |
if fargs isa Nothing | |
nothing | |
else | |
[:($(recufunction)), fargs[2:end]...] | |
end, | |
[Core.Const(recufunction), argtypes[2:end]...], | |
) | |
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods) | |
arginfo2 = ArgInfo( | |
if fargs isa Nothing | |
nothing | |
else | |
[:($(recufunction)), fargs[2:end]...] | |
end, | |
[Core.Const(recufunction), argtypes[2:end]...], | |
) | |
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: b7303e5 | Previous: 45ae14f | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1449157594 ns |
1287700343 ns |
1.13 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1301919790 ns |
1271515659 ns |
1.02 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1339557972 ns |
1253394269 ns |
1.07 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3312079307 ns |
3106663633 ns |
1.07 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
206606524 ns |
217499591 ns |
0.95 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
5262646551 ns |
6749076193 ns |
0.78 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
5233063986 ns |
5078740247 ns |
1.03 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
5084455177 ns |
5013817961 ns |
1.01 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
7686400566 ns |
7197691815 ns |
1.07 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
26339246221 ns |
35464964244 ns |
0.74 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1300005635 ns |
1257317145 ns |
1.03 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1278041149 ns |
1424374803 ns |
0.90 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1261990698 ns |
1350049098 ns |
0.93 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3125146586 ns |
3052800629 ns |
1.02 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8879631 ns |
8862682 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1550527051 ns |
1572590140 ns |
0.99 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1552400963 ns |
1559474266 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1552125020 ns |
1557501067 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3310850083 ns |
3290628669 ns |
1.01 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
2775956032 ns |
2876354148 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1303015586 ns |
1231219515 ns |
1.06 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1272928755 ns |
1441159242 ns |
0.88 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1311413197 ns |
1282010253 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3028555629 ns |
3051584957 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
22655396 ns |
22776746 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
2140398211 ns |
2154505585 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2200393344 ns |
2139776302 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
2142222871 ns |
2123332313 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
3897215106 ns |
3879039560 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
5312568392 ns |
5729200009 ns |
0.93 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1307990936 ns |
1259798635 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1301819826 ns |
1262851193 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1284427966 ns |
1266665882 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3169837598 ns |
3319553871 ns |
0.95 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7453064 ns |
7445203.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1409136279 ns |
1424258021 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1409545691 ns |
1421721118 ns |
0.99 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
1414236404 ns |
1420742881 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3151606700 ns |
3162578762 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1654006772.5 ns |
2138106366 ns |
0.77 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1291669432 ns |
1297050944 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1265833403 ns |
1403907055 ns |
0.90 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1278433111 ns |
1269229731 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3126809956 ns |
3063143344 ns |
1.02 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
12328188 ns |
12347497 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
1741906628 ns |
1721006513 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1731592537 ns |
1711405549 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
1720273302 ns |
1704835369 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
3450588571 ns |
3443971150 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
2948602836 ns |
3110298785 ns |
0.95 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1494612899 ns |
1266729302 ns |
1.18 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1311317968 ns |
1308873395 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1492915221 ns |
1275958493 ns |
1.17 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3115105513 ns |
3081413477 ns |
1.01 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
27412509 ns |
27435162 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
2228730818 ns |
2169947879 ns |
1.03 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2334825207 ns |
2163945294 ns |
1.08 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
2310305349 ns |
2151891950 ns |
1.07 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
3944966197 ns |
3946269320 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
6131212634 ns |
6287057122 ns |
0.98 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1303567764 ns |
1260705673 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1424871003 ns |
1369717954 ns |
1.04 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1275689864 ns |
1281076652 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) |
3045934410 ns |
3130042297 ns |
0.97 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
52971586 ns |
53036705.5 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) |
3055665974 ns |
3050356994 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
3021313773 ns |
3082997102 ns |
0.98 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) |
3053225043 ns |
2965563203 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) |
4887749197 ns |
4841087626 ns |
1.01 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
11183611226 ns |
8484129480 ns |
1.32 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1300865042 ns |
1260921375 ns |
1.03 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1295735580 ns |
1253872568 ns |
1.03 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1232925244 ns |
1479498539 ns |
0.83 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2922815725 ns |
3113671601 ns |
0.94 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
71283297 ns |
71338519.5 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) |
3270546818 ns |
3125511597 ns |
1.05 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3230464036 ns |
3098530069 ns |
1.04 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) |
3254041312 ns |
3115589553 ns |
1.04 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) |
5162220727 ns |
5036626230 ns |
1.02 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
15170850681 ns |
11289651474 ns |
1.34 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) |
1278655290 ns |
1339569725 ns |
0.95 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1229847740 ns |
1259019883 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) |
1439473418 ns |
1254828379 ns |
1.15 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) |
2922143773 ns |
2975337456 ns |
0.98 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
20699816 ns |
20758936 ns |
1.00 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) |
1963950807 ns |
1859519475 ns |
1.06 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
2218798778 ns |
1869845638 ns |
1.19 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) |
2058391749 ns |
1850101657 ns |
1.11 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) |
3614980515 ns |
3593739548 ns |
1.01 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
3206903233.5 ns |
3325189113.5 ns |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
No description provided.