Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hint compiler to not specialize compile methods #144

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Oct 2, 2024

compile methods generate the same return types no matter the arguments. By adding @nospecialize and maybe @nospecializeinfer, maybe we can reduce the TTFX on compile when changing functions.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: 222625f Previous: 4f4cb40 Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1318940641 ns 1427801876 ns 0.92
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 232704274 ns 241937991 ns 0.96
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5297983590 ns 5706546145 ns 0.93
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 27391225052 ns 19592540088 ns 1.40
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1291926315.5 ns 1367040220.5 ns 0.95
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8101212 ns 8765018 ns 0.92
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1625149502 ns 1617045921.5 ns 1.01
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 2291206144 ns 2228339749.5 ns 1.03
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1334446605.5 ns 1369778487.5 ns 0.97
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 93860961 ns 95384074 ns 0.98
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2169485862 ns 2267863268 ns 0.96
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 4271876468 ns 4660275687.5 ns 0.92
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1316968758 ns 1374725656.5 ns 0.96
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7464719.5 ns 7919007 ns 0.94
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1471509052 ns 1523146334.5 ns 0.97
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1559028201 ns 1657446563 ns 0.94
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1345744439.5 ns 1293396969 ns 1.04
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 11653564 ns 84573801 ns 0.14
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1754843139 ns 1829891224 ns 0.96
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 2487275309.5 ns 3023920644 ns 0.82
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1545602240 ns 1292433719 ns 1.20
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 95539755.5 ns 88498026 ns 1.08
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2229052978 ns 2315496105 ns 0.96
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 3716833384.5 ns 4181219607 ns 0.89
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1317185684 ns 1367617841 ns 0.96
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 122600223 ns 113824695 ns 1.08
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 3091138651 ns 3118475424 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 6307106053.5 ns 6707702872 ns 0.94
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1314407457 ns 1342010577 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 128864521.5 ns 129882165.5 ns 0.99
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3260848845 ns 3304575961 ns 0.99
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 7279675017 ns 9641559937 ns 0.76
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1382526551 ns 1318534127.5 ns 1.05
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 86162139 ns 87600441 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 2330506538 ns 1971771485 ns 1.18
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 2552811939.5 ns 2525837503 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

github-actions bot commented Oct 2, 2024

Benchmark Results

main 222625f... main/222625f4c2fcab...
comptime/basics/2D sum 26.9 ± 2.1 ms 26.8 ± 1.1 ms 1
comptime/basics/Basic grad cos 0.0449 ± 0.0023 s 0.0447 ± 0.0015 s 1
comptime/basics/cos.(x) 31.2 ± 1.3 ms 0.0332 ± 0.0014 s 0.941
comptime/lux neural networks/ViT base 5.69 s 6.1 s 0.931
comptime/lux neural networks/ViT tiny 5.82 s 5.88 s 0.99
comptime/lux neural networks/vgg11 bn=false 0.389 ± 0.041 s 0.427 ± 0.019 s 0.911
comptime/lux neural networks/vgg13 bn=false 0.436 ± 0.02 s 0.43 ± 0.018 s 1.01
comptime/lux neural networks/vgg16 bn=false 0.521 ± 0.0026 s 0.515 ± 0.0041 s 1.01
comptime/lux neural networks/vgg19 bn=false 0.615 ± 0.03 s 0.612 ± 0.024 s 1
runtime/lux neural networks/ViT base (compiled) 6.4 s 6.45 s 0.993
runtime/lux neural networks/ViT tiny (compiled) 1.69 s 1.72 s 0.979
runtime/lux neural networks/vgg11 bn=false (compiled) 2.12 s 2.09 s 1.02
runtime/lux neural networks/vgg13 bn=false (compiled) 2.97 s 2.93 s 1.01
runtime/lux neural networks/vgg16 bn=false (compiled) 3.73 s 3.73 s 1
runtime/lux neural networks/vgg19 bn=false (compiled) 4.53 s 4.59 s 0.988
time_to_load 1.45 ± 0.021 s 1.45 ± 0.014 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant