-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
55% strlen and memchr optimization with SIMD on x86-64 | Macros config SIMD #8421
base: master
Are you sure you want to change the base?
Conversation
Hello, Thanks for the PR! Where does the code for the 2 implementations come from? As the problem is relatively well defined and "simple", I am wondering if they may be well known implementations? Do you have links that may describe the approach (I'm not sure I understand all the code, so any further comment may be useful to facilitate possible future maintenance) It's probably well valuable for fast-forwarding during text display, I'll be benchmarking in that specific scenario. |
Hello. |
I also tested |
I added an optimized ImStrlen with implementations on SSE and AVX2. |
I'm very skeptical that we should replace the std functions such as If they can be optimized, then they should be optimized in the C library, not in the ImGui library. |
Standard functions may not always be able to take advantage of CPU-features. We typically have to reinvent many small wheels for projects like this. But empirically, I noted an issue that sometimes affected such attempts: std lib functions we call are almost always compiled optimized, while our own functions are compiled by the project. For certain functions, the potential loss of project compiling imgui in debug mode may be meaningful or problematic (this is why, e.g. I haven't provided a qsort replacement, which is quite a frustrating hurdle to trip on, because libc qsort is annoyingly not allowing a user data pointer). Either way, this should be tested in a typical real world scenario + some edge cases to see if the gain may be meaningful and if it's worth going forward with this. |
e.g. before c73f835 after c67c202 So this is much neater now and will be unlikely to conflict. |
macros config SIMD
Added including of a compiler-specific intrinsics library.
Updated macros for configuring a build from SIMD to x86-64.
SIMD ImStrlen and ImMemchr
Created optimized ImStrlen and ImMemchr functions on SSE and AVX2.
Replaced using strlen and memchr with ImStrlen and ImMemchr.
Benchmark
Benchmark
System Specifications
CPU: Intel Core i9-10980XE
RAM: 128GB (4x32 GB)
BIOS settings
ImStrlen benchmark
Description
Search null terminator
\0
, in astd::string
buffer filled with random ASCII characters. Buffer sizes range from 1 MB to 1 GB. Various memchr implementations using SSE and AVX2 are tested for performance.Google benchmark results
ImStrlen benchmark
ImMemchr benchmark
Description
Search for all lines of length 131, ending with
\n
, in astd::string
buffer filled with random ASCII characters. Buffer sizes range from 1 MB to 1 GB. Various memchr implementations using SSE and AVX2 are tested for performance.Google benchmark results
ImMemchr benchmark