Just a nice CUDA example!
Hardware
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Quad-Core
GPU: GeForce RTX 2070
Sorting 100.000.000 integers
Lowest time of 5 sorts on shuffled array
Time CPU: 11.9298 seconds (sequential with -O3 flag)
Time GPU: 0.868567 seconds
Speedup: ~13.735