-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couple suggestions #264
Comments
Geometric mean calculation requires more resources and I'm not sure it's worth it - in theory, I'd prefer to have median and quartiles like in my own project (https://github.com/nuald/simple-web-benchmark), but it could be an overkill here as the used algorithms are not quite consistent and optimized to provide the accurate timing between iterations. The provided numbers are good enough for the purpose of comparing the languages for the average situations, but the optimized benchmarks with the highly accurate results are out of scope of this project. Please consider it as a playground - we have some numbers, but if you want to have the great precision and deep comparisons, please use the appropriate algorithms and tools (and see below for some examples). Zig language tests have been requested already, but we decided to not go with it - please see discussion in #188 Performance deviance for JIT have been considered too: #248 Right now, the proposed approach is running some minimal tests before benchmarks (WIP) - it addresses the lazy initialization and some precompilations. But having proper warm up doesn't necessary give better results as explained in the ticket. I guess the proper benchmarks for JIT languages would have disabled GC, but that would be out of scope of this project as we don't consider overly optimized code for the performance, but rather the average code. As for the quoted text, I'm not quite sure about what's the point of the author. Surely, they can be edge cases when someone would be interested in "compile + run" time, but majority of developers don't need it (especially, if compilation have few phases like linting, preprocessing etc). The same about JIT - surely, it can optimize the frequently called methods, but in reality it depends on tasks. Profilers and other tools could help with the optimizing too, but it's out of scope of this project. The goal is to show some average performance, surely it's possible to have good optimized versions in many languages (especially if use tricks like inline assembler or C code calls). |
By geometric mean, I meant taking a geometric mean of all test results of a language implementation, and doing this for every language to build a final results table. You'd be taking the geomean of ex. CPython's brainfuck, base64, json, matmul and havlak times and putting that into a new table at the end. Then the same thing for all the other languages. This way, you could find the best performer overall. This is what Benchmarks Game does. As for the quoted text, I believe that the criticism being made is that this benchmark is not consistent with the way it times compiled languages. Julia's JIT gets timed with the execution, but Java and C are compiled first and then being timed on the runtime alone. |
The only pure tests of the languages are bf-tests, all other are mostly libraries tests. Havlak ones could be considered as the language tests too, but I guess we'll remove it in the near future as I found some inconsistencies there, and it's possible that these are not quite fair. Given that, I don't think it's worth to make any overall mean calculations as libraries in general are orthogonal to languages (like NumPy and its dependencies written in C and Fortran). As for the JIT, unfortunately, I don't see any ways to do the fair comparisons. However, looks like there are misunderstanding about the measurement, so I've added notes in #281 (as time is measured only for the benchmark itself, Julia JIT compilation doesn't affect the results as it happens before the benchmark). Run-time JIT optimization is another story though, and I guess for now we're just going to live with that. |
@nuald Taking a look at that zig issue, I don't even like language but I think your stance of doing a big allocation inside a hotloop is pretty crazy, ngl. |
Heh, 132kb is not that big, plus it's closer to real world situations when in majority of the use-cases one would need an allocate memory for the encoding/decoding operations. Please note that the name "Base 64" could be a little bit misleading here, and the notes clearly indicate that:
All other tests allocate memory (granted, lower level languages have an advantage here as they could use stack allocation). I don't see any particular reason to make an exception, sorry. |
I know I could probably do it myself in Excel but I think it would be useful to have a geometric mean of the test results. Also, could we get the Zig language added? And one last thing, I read this comment on reddit in response to this benchmark:
The text was updated successfully, but these errors were encountered: