Skip to content

racz16/Graphics-API-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

52f6919 · Mar 9, 2025

History

4 Commits
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025
Mar 9, 2025

Repository files navigation

Graphics API Benchmark

This repository was created to measure the performance of solutions to common graphics problems. At the moment, there is only 1 benchmark about frequent buffer updates.

Uniform Buffer Update

It's a common scenario in graphics applications that we need to update a buffer in every frame, and then read it in a shader. OpenGL provides different APIs to update a buffer, and we can use different flags as well. Also, we can update our buffer(s), based on different strategies.

In constrast to Vulkan, OpenGL usually doesn't require manual synchronization. However, this just means that the driver has to take care of those synchronizations in the background. When you write to a buffer and then draw, the driver can't start the draw before the write is finished, otherwise, the shader would read outdated data from the buffer. Similarly, the driver can't start the buffer write until the previous frame's draw finished, otherwise it'd override the data while it's in use.

Another common problem is the number of API calls. The more the API calls, the bigger the overhead, so usually, it's a good idea to reduce the number of API calls.

Solutions

I compare my solutions by rendering a certain number of frames with each solution and measuring how long it takes to render those frames.

In order to measure the difference between the solutions, I try to make my application CPU-limited. The vertex shader is basically just a matrix-vector multiplication, and the fragment shader only returns the color red. I render a few quads on very low resolution.

All my solutions use OpenGL 4.6 and DSA versions of the functions, and they are all the same, except for how I create, update, and delete the uniform buffers. I intentionally don't use instancing or multidraw, because I'm interested in the overhead of buffer updates. Also, I don't allocate memory in the heap while the render loop runs to reduce the pressure on the garbage collector.

The solutions:

  • Mapping: Every time I have to update the buffer, I map it, use memcpy, and finally unmap it.
  • Persistent mapping: I only map the buffer initially and never unmap it, but it requires synchronization.
  • BufferSubData: I use glNamedBufferSubData to update my buffer.
  • BufferData: I use glNamedBufferData to reallocate and update my buffer (buffer orphaning).

Each solution has some variations where I use different OpenGL flags or allocation/update strategies.

  • MERGED: I allocate a bigger buffer to store all instance data instead of creating separate buffers for each instance. This way, I can update all my data once instead of updating all my buffers individually.
  • DOUBLE_BUFFERING: I allocate twice as much memory as required and use the 1st half in even frames, while the 2nd half in odd frames. This way, when I update my buffer, the driver doesn't have to wait until the previous frame's draw calls to finish.
  • TRIPLE_BUFFERING: Same as DOUBLE_BUFFERING, but I use 3 times more memory and use different thirds of the buffer in different frames.
  • SYNC_EACH: In case of persistent mapping, synchronization is required to avoid buffer writes while the shader still reads it. SYNC_EACH synchronises for each buffer.
  • SYNC_ALL: The same as SYNC_EACH, but it only synchronizes once per frame instead of once per buffer.
  • INVALIDATE: I call the glInvalidateBufferSubData function, or use the GL_MAP_INVALIDATE_RANGE_BIT flag to signal OpenGL that I will no longer read the old data from the region, so the driver is free to allocate memory for the new data, and eventually release the old one. This way, the driver doesn't have to wait until the draw call (or any other previously issued reads) finish to override the buffer's content.

Results

The measurements are in the Results folder. There, you can find the application's raw outputs, but I suggest you open the .pdf or the .xlsx file instead.

The 1st and 2nd columns are the names of the solutions and the solution options that I just described. The next 2 columns are the flags used when I create and when I map the buffer. The remaining columns are the benchmarks, where the 1st column is the percentage, and the 2nd column is the time. Percentages are relative values within the column compared to the fastest solution, so each column will have at least 1 solution with 100%.

Improvements

There are still some other ways I could update a buffer, but most of them would probably be slower.

  • When double or triple buffering, I could allocate a new buffer for each part instead of suballocating a 2 or 3 times bigger buffer.
  • Using the GL_MAP_UNSYNCHRONIZED_BIT for mapping.
  • Using 1 small buffer and overriding it for each draw call.
  • Using an SSBO.
  • Using a sparse buffer.
  • Using a per instance vertex buffer (but not using instancing itself) to index my uniform buffer, instead of calling glBindBufferRange before each draw call.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks