Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: kernels #314

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

WIP: kernels #314

wants to merge 3 commits into from

Conversation

wsmoses
Copy link
Member

@wsmoses wsmoses commented Nov 29, 2024

No description provided.

@safetestset "Linear Algebra" include("linear_algebra.jl")
end
#if REACTANT_TEST_GROUP == "all" || REACTANT_TEST_GROUP == "core"
@safetestset "CUDA" include("cuda.jl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
@safetestset "CUDA" include("cuda.jl")
@safetestset "CUDA" include("cuda.jl")

using Adapt

function Adapt.adapt_storage(::CUDA.KernelAdaptor, xs::TracedRArray{T,N}) where {T,N}
CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs))
return CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs))

CuDeviceArray{T,N,CUDA.AS.Global}(pointer(xs.mlir_data.value), size(xs))
end

const _kernel_instances = Dict{Any, Any}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
const _kernel_instances = Dict{Any, Any}()
const _kernel_instances = Dict{Any,Any}()

cache = CUDA.compiler_cache(cuda.context)
source = CUDA.methodinstance(F, tt)
config = CUDA.compiler_config(cuda.device; kwargs...)::CUDA.CUDACompilerConfig
fun = CUDA.GPUCompiler.cached_compilation(cache, source, config, CUDA.compile, CUDA.link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
fun = CUDA.GPUCompiler.cached_compilation(cache, source, config, CUDA.compile, CUDA.link)
fun = CUDA.GPUCompiler.cached_compilation(
cache, source, config, CUDA.compile, CUDA.link
)

Comment on lines +26 to +28
@show fun
@show fun.mod
# create a callable object that captures the function instance. we don't need to think
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
@show fun
@show fun.mod
# create a callable object that captures the function instance. we don't need to think
@show fun
@show fun.mod
# create a callable object that captures the function instance. we don't need to think

mapany,
MethodResultPure


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

Comment on lines +70 to +79

arginfo2 = ArgInfo(
if fargs isa Nothing
nothing
else
[:($(recufunction)), fargs[2:end]...]
end,
[Core.Const(recufunction), argtypes[2:end]...],
)
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
arginfo2 = ArgInfo(
if fargs isa Nothing
nothing
else
[:($(recufunction)), fargs[2:end]...]
end,
[Core.Const(recufunction), argtypes[2:end]...],
)
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods)
arginfo2 = ArgInfo(
if fargs isa Nothing
nothing
else
[:($(recufunction)), fargs[2:end]...]
end,
[Core.Const(recufunction), argtypes[2:end]...],
)
return abstract_call_known(interp, recufunction, arginfo2, si, sv, max_methods)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: b7303e5 Previous: 45ae14f Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1449157594 ns 1287700343 ns 1.13
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1301919790 ns 1271515659 ns 1.02
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1339557972 ns 1253394269 ns 1.07
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 3312079307 ns 3106663633 ns 1.07
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 206606524 ns 217499591 ns 0.95
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 5262646551 ns 6749076193 ns 0.78
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5233063986 ns 5078740247 ns 1.03
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 5084455177 ns 5013817961 ns 1.01
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 7686400566 ns 7197691815 ns 1.07
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 26339246221 ns 35464964244 ns 0.74
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1300005635 ns 1257317145 ns 1.03
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1278041149 ns 1424374803 ns 0.90
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1261990698 ns 1350049098 ns 0.93
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 3125146586 ns 3052800629 ns 1.02
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8879631 ns 8862682 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1550527051 ns 1572590140 ns 0.99
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1552400963 ns 1559474266 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1552125020 ns 1557501067 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3310850083 ns 3290628669 ns 1.01
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 2775956032 ns 2876354148 ns 0.97
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1303015586 ns 1231219515 ns 1.06
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1272928755 ns 1441159242 ns 0.88
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1311413197 ns 1282010253 ns 1.02
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 3028555629 ns 3051584957 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 22655396 ns 22776746 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2140398211 ns 2154505585 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2200393344 ns 2139776302 ns 1.03
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2142222871 ns 2123332313 ns 1.01
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 3897215106 ns 3879039560 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 5312568392 ns 5729200009 ns 0.93
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1307990936 ns 1259798635 ns 1.04
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1301819826 ns 1262851193 ns 1.03
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1284427966 ns 1266665882 ns 1.01
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 3169837598 ns 3319553871 ns 0.95
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7453064 ns 7445203.5 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1409136279 ns 1424258021 ns 0.99
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1409545691 ns 1421721118 ns 0.99
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1414236404 ns 1420742881 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3151606700 ns 3162578762 ns 1.00
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1654006772.5 ns 2138106366 ns 0.77
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1291669432 ns 1297050944 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1265833403 ns 1403907055 ns 0.90
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1278433111 ns 1269229731 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 3126809956 ns 3063143344 ns 1.02
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 12328188 ns 12347497 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 1741906628 ns 1721006513 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1731592537 ns 1711405549 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 1720273302 ns 1704835369 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 3450588571 ns 3443971150 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 2948602836 ns 3110298785 ns 0.95
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1494612899 ns 1266729302 ns 1.18
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1311317968 ns 1308873395 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1492915221 ns 1275958493 ns 1.17
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 3115105513 ns 3081413477 ns 1.01
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 27412509 ns 27435162 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 2228730818 ns 2169947879 ns 1.03
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2334825207 ns 2163945294 ns 1.08
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 2310305349 ns 2151891950 ns 1.07
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 3944966197 ns 3946269320 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 6131212634 ns 6287057122 ns 0.98
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1303567764 ns 1260705673 ns 1.03
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1424871003 ns 1369717954 ns 1.04
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1275689864 ns 1281076652 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 3045934410 ns 3130042297 ns 0.97
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 52971586 ns 53036705.5 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 3055665974 ns 3050356994 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 3021313773 ns 3082997102 ns 0.98
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 3053225043 ns 2965563203 ns 1.03
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 4887749197 ns 4841087626 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 11183611226 ns 8484129480 ns 1.32
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1300865042 ns 1260921375 ns 1.03
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1295735580 ns 1253872568 ns 1.03
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1232925244 ns 1479498539 ns 0.83
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 2922815725 ns 3113671601 ns 0.94
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 71283297 ns 71338519.5 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 3270546818 ns 3125511597 ns 1.05
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3230464036 ns 3098530069 ns 1.04
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 3254041312 ns 3115589553 ns 1.04
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 5162220727 ns 5036626230 ns 1.02
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 15170850681 ns 11289651474 ns 1.34
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1278655290 ns 1339569725 ns 0.95
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1229847740 ns 1259019883 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1439473418 ns 1254828379 ns 1.15
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 2922143773 ns 2975337456 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 20699816 ns 20758936 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1963950807 ns 1859519475 ns 1.06
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 2218798778 ns 1869845638 ns 1.19
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 2058391749 ns 1850101657 ns 1.11
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3614980515 ns 3593739548 ns 1.01
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 3206903233.5 ns 3325189113.5 ns 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

Benchmark Results

main b7303e5... main/b7303e59baa639...
comptime/NN/ViT base (optimize = :after_enzyme) 9.4 s 9.7 s 0.97
comptime/NN/ViT base (optimize = :all) 8.97 s 9.27 s 0.968
comptime/NN/ViT base (optimize = :before_enzyme) 8.86 s 9.26 s 0.956
comptime/NN/ViT base (optimize = :only_enzyme) 9.1 s 9.58 s 0.949
comptime/NN/ViT tiny (optimize = :after_enzyme) 6.74 s 6.92 s 0.975
comptime/NN/ViT tiny (optimize = :all) 6.88 s 7.21 s 0.955
comptime/NN/ViT tiny (optimize = :before_enzyme) 6.76 s 7.13 s 0.948
comptime/NN/ViT tiny (optimize = :only_enzyme) 6.93 s 7.2 s 0.962
comptime/NN/vgg11 bn=false (optimize = :after_enzyme) 1.15 ± 0.016 s 1.15 ± 0.054 s 1
comptime/NN/vgg11 bn=false (optimize = :all) 1.04 ± 0.05 s 1 ± 0.051 s 1.04
comptime/NN/vgg11 bn=false (optimize = :before_enzyme) 0.978 ± 0.015 s 1.18 ± 0.076 s 0.832
comptime/NN/vgg11 bn=false (optimize = :only_enzyme) 1.02 ± 0.013 s 1.07 ± 0.025 s 0.957
comptime/NN/vgg11 bn=true (optimize = :after_enzyme) 1.77 ± 0.031 s 1.81 ± 0.051 s 0.98
comptime/NN/vgg11 bn=true (optimize = :all) 1.84 ± 0.01 s 1.94 ± 0.0086 s 0.952
comptime/NN/vgg11 bn=true (optimize = :before_enzyme) 1.79 ± 0.059 s 1.89 ± 0.12 s 0.949
comptime/NN/vgg11 bn=true (optimize = :only_enzyme) 1.77 ± 0.079 s 1.89 ± 0.0069 s 0.941
comptime/NN/vgg13 bn=false (optimize = :after_enzyme) 1.46 ± 0.099 s 1.49 ± 0.088 s 0.976
comptime/NN/vgg13 bn=false (optimize = :all) 1.58 ± 0.053 s 1.44 ± 0.084 s 1.1
comptime/NN/vgg13 bn=false (optimize = :before_enzyme) 1.5 ± 0.026 s 1.54 ± 0.013 s 0.972
comptime/NN/vgg13 bn=false (optimize = :only_enzyme) 1.57 ± 0.056 s 1.46 ± 0.017 s 1.07
comptime/NN/vgg13 bn=true (optimize = :after_enzyme) 2.38 s 2.42 s 0.983
comptime/NN/vgg13 bn=true (optimize = :all) 2.33 s 2.4 s 0.971
comptime/NN/vgg13 bn=true (optimize = :before_enzyme) 2.59 s 2.51 s 1.03
comptime/NN/vgg13 bn=true (optimize = :only_enzyme) 2.73 s 2.62 s 1.04
comptime/NN/vgg16 bn=false (optimize = :after_enzyme) 1.92 ± 0.054 s 1.73 ± 0.027 s 1.11
comptime/NN/vgg16 bn=false (optimize = :all) 2.52 s 1.71 ± 0.00015 s 1.48
comptime/NN/vgg16 bn=false (optimize = :before_enzyme) 1.98 ± 0.014 s 1.89 ± 0.13 s 1.04
comptime/NN/vgg16 bn=false (optimize = :only_enzyme) 2 ± 0.0019 s 1.7 ± 0.026 s 1.18
comptime/NN/vgg16 bn=true (optimize = :after_enzyme) 2.95 s 3.09 s 0.957
comptime/NN/vgg16 bn=true (optimize = :all) 3.09 s 2.85 s 1.08
comptime/NN/vgg16 bn=true (optimize = :before_enzyme) 3.08 s 3.18 s 0.968
comptime/NN/vgg16 bn=true (optimize = :only_enzyme) 3.23 s 2.85 s 1.13
comptime/NN/vgg19 bn=false (optimize = :after_enzyme) 2.03 ± 0.015 s 1.93 ± 0.064 s 1.05
comptime/NN/vgg19 bn=false (optimize = :all) 2.09 ± 0.084 s 1.88 ± 0.03 s 1.11
comptime/NN/vgg19 bn=false (optimize = :before_enzyme) 2.21 ± 0.046 s 2.22 ± 0.0043 s 0.998
comptime/NN/vgg19 bn=false (optimize = :only_enzyme) 2.08 ± 0.041 s 2.09 ± 0.031 s 0.995
comptime/NN/vgg19 bn=true (optimize = :after_enzyme) 3.47 s 3.6 s 0.962
comptime/NN/vgg19 bn=true (optimize = :all) 3.5 s 3.33 s 1.05
comptime/NN/vgg19 bn=true (optimize = :before_enzyme) 3.76 s 3.39 s 1.11
comptime/NN/vgg19 bn=true (optimize = :only_enzyme) 3.59 s 3.36 s 1.07
comptime/basics/2D sum (optimize = :after_enzyme) 25 ± 1 ms 25.2 ± 1.3 ms 0.993
comptime/basics/2D sum (optimize = :all) 28.7 ± 0.89 ms 29.6 ± 1.1 ms 0.97
comptime/basics/2D sum (optimize = :before_enzyme) 27.1 ± 0.71 ms 27.5 ± 1.1 ms 0.986
comptime/basics/2D sum (optimize = :only_enzyme) 21.6 ± 0.84 ms 22.4 ± 1.2 ms 0.968
comptime/basics/cos.(x) (optimize = :after_enzyme) 0.0326 ± 0.0011 s 0.0333 ± 0.001 s 0.979
comptime/basics/cos.(x) (optimize = :all) 0.0347 ± 0.00069 s 0.0352 ± 0.0014 s 0.987
comptime/basics/cos.(x) (optimize = :before_enzyme) 0.0337 ± 0.00087 s 0.0345 ± 0.0013 s 0.975
comptime/basics/cos.(x) (optimize = :only_enzyme) 29.4 ± 1.1 ms 30.1 ± 1 ms 0.978
comptime/basics/∇cos (optimize = :all) 0.0496 ± 0.0014 s 0.0509 ± 0.0022 s 0.973
runtime/NN/ViT base (optimize = :after_enzyme) 5.89 s 5.98 s 0.984
runtime/NN/ViT base (optimize = :all) 5.87 s 5.97 s 0.983
runtime/NN/ViT base (optimize = :before_enzyme) 5.97 s 5.93 s 1.01
runtime/NN/ViT base (optimize = :only_enzyme) 7.15 s 7.15 s 1
runtime/NN/ViT tiny (optimize = :after_enzyme) 1.51 s 1.53 s 0.984
runtime/NN/ViT tiny (optimize = :all) 1.51 s 1.55 s 0.975
runtime/NN/ViT tiny (optimize = :before_enzyme) 1.53 s 1.51 s 1.01
runtime/NN/ViT tiny (optimize = :only_enzyme) 2.43 s 2.45 s 0.994
runtime/NN/vgg11 bn=false (optimize = :after_enzyme) 2.05 s 2.08 s 0.984
runtime/NN/vgg11 bn=false (optimize = :all) 2.04 s 2.01 s 1.01
runtime/NN/vgg11 bn=false (optimize = :before_enzyme) 2 s 1.98 s 1.01
runtime/NN/vgg11 bn=false (optimize = :only_enzyme) 1.88 s 1.85 s 1.02
runtime/NN/vgg11 bn=true (optimize = :after_enzyme) 2.18 s 2.2 s 0.988
runtime/NN/vgg11 bn=true (optimize = :all) 2.19 s 2.15 s 1.02
runtime/NN/vgg11 bn=true (optimize = :before_enzyme) 2.17 s 2.19 s 0.99
runtime/NN/vgg11 bn=true (optimize = :only_enzyme) 2.2 s 2.17 s 1.01
runtime/NN/vgg13 bn=false (optimize = :after_enzyme) 2.65 s 2.9 s 0.914
runtime/NN/vgg13 bn=false (optimize = :all) 2.83 s 2.87 s 0.985
runtime/NN/vgg13 bn=false (optimize = :before_enzyme) 2.72 s 2.86 s 0.951
runtime/NN/vgg13 bn=false (optimize = :only_enzyme) 2.64 s 2.58 s 1.02
runtime/NN/vgg13 bn=true (optimize = :after_enzyme) 3.07 s 3.06 s 1.01
runtime/NN/vgg13 bn=true (optimize = :all) 3.06 s 3.08 s 0.991
runtime/NN/vgg13 bn=true (optimize = :before_enzyme) 3.05 s 3.01 s 1.01
runtime/NN/vgg13 bn=true (optimize = :only_enzyme) 3.14 s 3.17 s 0.989
runtime/NN/vgg16 bn=false (optimize = :after_enzyme) 3.68 s 3.56 s 1.04
runtime/NN/vgg16 bn=false (optimize = :all) 3.57 s 3.41 s 1.05
runtime/NN/vgg16 bn=false (optimize = :before_enzyme) 3.57 s 3.61 s 0.99
runtime/NN/vgg16 bn=false (optimize = :only_enzyme) 3.52 s 3.44 s 1.02
runtime/NN/vgg16 bn=true (optimize = :after_enzyme) 3.92 s 3.81 s 1.03
runtime/NN/vgg16 bn=true (optimize = :all) 3.97 s 3.92 s 1.01
runtime/NN/vgg16 bn=true (optimize = :before_enzyme) 3.86 s 3.91 s 0.989
runtime/NN/vgg16 bn=true (optimize = :only_enzyme) 4.06 s 4.14 s 0.982
runtime/NN/vgg19 bn=false (optimize = :after_enzyme) 4.31 s 4.31 s 1
runtime/NN/vgg19 bn=false (optimize = :all) 4.35 s 4.47 s 0.972
runtime/NN/vgg19 bn=false (optimize = :before_enzyme) 4.34 s 4.38 s 0.991
runtime/NN/vgg19 bn=false (optimize = :only_enzyme) 4.16 s 4.14 s 1.01
runtime/NN/vgg19 bn=true (optimize = :after_enzyme) 4.81 s 4.71 s 1.02
runtime/NN/vgg19 bn=true (optimize = :all) 4.78 s 4.89 s 0.977
runtime/NN/vgg19 bn=true (optimize = :before_enzyme) 4.78 s 4.65 s 1.03
runtime/NN/vgg19 bn=true (optimize = :only_enzyme) 5.15 s 5.17 s 0.996
time_to_load 1.88 ± 0.035 s 1.88 ± 0.023 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant