GitHub - yolhan83/ProsperoChal: ProsperoChal is an attempt to the Prospero Challenge in julia.

ProsperoChal is an attempt to the Prospero Challenge (https://www.mattkeeter.com/projects/prospero/) in julia.

Prerequisites

Having Julia installed.
Having an Nvidia GPU compatible with CUDA.
Care a little about the Prospero Challenge ...

Usage

Clone the repo

You may need to instantiate (install deps with julia --project -e "using Pkg; Pkg.instantiate" )

They are three ways to run this benchmark,

run the app julia --project -e "using ProsperoChal" 1024 and time it on your operating system. This will include the startup time of julia, the kernel compilations and the julia JIT time.
run julia --project -e "using ProsperoChal; bench_proper(ARGS)" 1024. This won't include the startup time of julia, the kernel compilations and the julia JIT time and will show you the best and worse times together with the mean and median times.
run julia --project -e "using ProsperoChal; profile_cuda(ARGS)" 1024. This will profile the cuda kernels and show you the time it took to launch and run each kernel.

It should give you something like this,

-> Measure-Command { julia --project -e "using ProsperoChal"  1024}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 13
Milliseconds      : 958
Ticks             : 139582152
TotalDays         : 0,000161553416666667
TotalHours        : 0,003877282
TotalMinutes      : 0,23263692
TotalSeconds      : 13,9582152
TotalMilliseconds : 13958,2152

-> julia --project -e "using ProsperoChal; bench_proper(ARGS)"  1024
BenchmarkTools.Trial: 460 samples with 1 evaluation per sample.
 Range (min … max):   5.047 ms … 20.850 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.164 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.844 ms ±  3.217 ms  ┊ GC (mean ± σ):  1.78% ± 4.94%

   ▂             ▂ ▅   ▃▂▂▄ ▄▆▇▆█ ▁
  ▇█████▇█▃▃▃▃▃▅▅███▇██████████████▇▅▃▅▄▄▁▄▄▄▃▁▃▃▄▁▃▃▁▁▁▁▄▄▃▃ ▄
  5.05 ms         Histogram: frequency by time          20 ms <

 Memory estimate: 2.01 MiB, allocs estimate: 139.

-> julia --project -e "using ProsperoChal; profile_cuda(ARGS)"  1024
Profiler ran for 23.74 ms, capturing 869 events.

Host-side activity: calling CUDA APIs took 19.75 ms (83.17% of the trace)
┌──────────┬────────────┬───────┬─────────────────────────────────────┬─────────────────────────┐
│ Time (%) │ Total time │ Calls │ Time distribution                   │ Name                    │
├──────────┼────────────┼───────┼─────────────────────────────────────┼─────────────────────────┤
│   83.18% │   19.75 ms │     3 │   6.58 ms ± 11.4   (   0.0 ‥ 19.75) │ cuStreamSynchronize     │
│    4.28% │    1.02 ms │     1 │                                     │ cuMemcpyDtoHAsync       │
│    0.25% │    60.0 µs │     1 │                                     │ cuMemsetD8Async         │
│    0.12% │    28.3 µs │     1 │                                     │ cuLaunchKernel          │
│    0.10% │    23.1 µs │     1 │                                     │ cuMemcpyHtoDAsync       │
│    0.02% │     5.5 µs │     2 │   2.75 µs ± 0.92   (   2.1 ‥ 3.4)   │ cuMemAllocFromPoolAsync │
│    0.00% │   900.0 ns │     1 │                                     │ cuCtxSetCurrent         │
│    0.00% │   100.0 ns │     1 │                                     │ cuDeviceGetCount        │
│    0.00% │   100.0 ns │     1 │                                     │ cuCtxGetDevice          │
└──────────┴────────────┴───────┴─────────────────────────────────────┴─────────────────────────┘

Device-side activity: GPU was busy for 21.29 ms (89.68% of the trace)
┌──────────┬────────────┬───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Time (%) │ Total time │ Calls │ Name                                                                                                                                                                                                                                                              ⋯
├──────────┼────────────┼───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│   86.90% │   20.63 ms │     1 │ gpu_fkernel_(CompilerMetadata<DynamicSize, DynamicCheck, void, CartesianIndices<2, Tuple<OneTo<Int64>, OneTo<Int64>>>, NDRange<2, DynamicSize, DynamicSize, CartesianIndices<2, Tuple<OneTo<Int64>, OneTo<Int64>>>, CartesianIndices<2, Tuple<OneTo<Int64>, OneTo ⋯
│    2.67% │  635.06 µs │     1 │ [copy device to pageable memory]                                                                                                                                                                                                                                  ⋯
│    0.10% │   24.16 µs │     1 │ [set device memory]                                                                                                                                                                                                                                               ⋯
│    0.01% │    1.95 µs │     1 │ [copy pageable to device memory]                                                                                                                                                                                                                                  ⋯
└──────────┴────────────┴───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                                                                                                                                                                                                                     1 column omitted

Side notes

How to run on cpu ?

To run the benchmarks on cpu, you need to change the 4th line in ProsperoChal.jl being const device = CUDA.cu to const device = identity. Then you can run julia -t auto --project -e "using ProsperoChal" 1024 which will use all threads available on your system. You should get something around,

> julia -t auto --project -e "using ProsperoChal; bench_proper(ARGS)"  1024
BenchmarkTools.Trial: 11 samples with 1 evaluation per sample.
 Range (min … max):  489.620 ms … 500.339 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     492.213 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   493.293 ms ±   3.262 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  ▁        ▁▁▁        ▁      █       ▁                     ▁  
  █▁▁█▁▁▁▁▁▁▁▁███▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  490 ms           Histogram: frequency by time          500 ms <

 Memory estimate: 2.08 MiB, allocs estimate: 1459.

How to run on non-nvidia GPU ? This is also quite easy to do, follow this :

remove cuda : julia --project -e "using Pkg; Pkg.rm(String(:CUDA))"
add your gpu package (AMDGPU.jl for AMD gpu, ONEAPI.jl for intel gpu and Metal.jl for apple gpu) : julia --project -e "using Pkg; Pkg.add(String(:YOUR_GPU_PACKAGE))"
change the 1st line in ProsperoChal.jl being using CUDA,BenchmarkTools,KernelAbstractions to using YOUR_GPU_PACKAGE,BenchmarkTools,KernelAbstractions
change the 4th line in ProsperoChal.jl being const device = CUDA.cu to const device = the function of your gpu package that transfer a cpu array to a gpu array.
run julia --project -e "using ProsperoChal" 1024

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
src		src
.gitignore		.gitignore
Project.toml		Project.toml
out.ppm		out.ppm
prospero.vm		prospero.vm
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProsperoChal is an attempt to the Prospero Challenge (https://www.mattkeeter.com/projects/prospero/) in julia.

Prerequisites

Usage

Side notes

About

Releases

Packages

Languages

yolhan83/ProsperoChal

Folders and files

Latest commit

History

Repository files navigation

ProsperoChal is an attempt to the Prospero Challenge (https://www.mattkeeter.com/projects/prospero/) in julia.

Prerequisites

Usage

Side notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages