-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDL3 GPU WebGPU Backend #12046
base: main
Are you sure you want to change the base?
SDL3 GPU WebGPU Backend #12046
Conversation
Congrats on the awesome progress! |
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
while (SDL_GetAtomicInt(&buffer->mappingComplete) != 1) { | ||
if (SDL_GetTicks() - startTime > TIMEOUT) { | ||
SDL_LogError(SDL_LOG_CATEGORY_GPU, "Failed to map buffer: timeout"); | ||
return NULL; | ||
} | ||
|
||
SDL_Delay(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spin-wait is a huge red flag. Generally speaking browser async operations should not be implemented this way. I would be very concerned that this will break on certain targets since generally async stuff on the web is specified to not be observable until the event loop turns; if this happens to work it could break in the future and nobody would know what was going on.
At a minimum you should have a comment here that specifies why it's safe/appropriate to do this instead of doing something else (I don't know what else you'd do offhand) - i.e. 'here's the part of the WebGPU spec that says this is legal and the spin should complete quickly' or 'i tested this on and on and '.
Thankfully this appears to only apply to readback which makes it have less of an impact on the overall API; it might be that what you need to do is specify an async readback API extension to SDL_GPU and make that the only legal way to do readback on the WebGPU target.
Blocking the browser's main thread (for up to 1000ms in this case) is very bad. It causes all sorts of downstream problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll throw some comments in! I'll also have to add some preprocessor macros to ensure that SDL_Delay(1) calls are specific to Emscripten. This is done since browser backends for WebGPU don't give access to device ticking, so we have to yield back to the browser for a tiny amount of time for the backend to tick the device for us.
Here's a quote from Elie Michel:
"When our C++ code runs in a Web browser (after being compiled to WebAssembly through emscripten), there is no explicit way to tick/poll the WebGPU device. This is because the device is managed by the Web browser itself, which decides at what pace polling should happen. As a result:
The device never ticks in between two consecutive lines of our WebAssembly module, it can only tick when the execution flow leaves the module.
The device always ticks between two calls to our MainLoop() function, because if you remember the Emscripten section of the Opening a Window chapter, we leave the main loop management to the browser and only provide a callback to run at each frame.
Thanks to the second point, we do not need wgpuPollEvents to do anything when called at the beginning or end of our main loop (so we set yieldToWebBrowser to false).
However, if what we intend is really to wait until something happens (e.g., a callback gets invoked), the first point requires us to make sure we yield back the execution flow to the Web browser, so that it may tick its device from time to time. We do this thanks to emscripten_sleep function, at the cost of effectively sleeping during 100 ms (we’re in a case where we want to wait anyways).
Note that using emscripten_sleep requires the -SASYNCIFY link option to be passed to emscripten, like we added already."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify an async readback API extension to SDL_GPU
We have an async readback API, it's the Download and QueryFence/WaitForFence functions. If the committee can't define their specification for this extremely common use case in a normal way like every single industry-standard API going back to D3D11 that is firmly their problem. I would rather force the webGPU backend to implement a hack to make it work our way than poison our API with something as stupid as an async buffer map call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so because you're relying on asyncify being set (I missed this, sorry! my bad) the sleep is not a spinwait but is instead a yield-to-browser-event-loop. That's much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm not only being grumpy in this thread, here's a quick sketch of how this could possibly work:
A "fence" in the webGPU backend could just be defined as a group of resources that are waiting on async map operations. Then implementing QueryFence would be as simple as checking buffer->mappingComplete
for each of these resources. WaitForFence could be implemented with the spinwait. That might be enough for this to work.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
while (!renderer->device) { | ||
SDL_Delay(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a forward progress guarantee here? Please specify what provides the guarantee of forward progress. A naive reading of this suggests that it might never stop spinning since there's no timeout. It would be nice to at least see a timeout here and have it error out when the timeout expires.
It would be even better to not have this spin-wait. It's a red flag and doesn't seem like it should be necessary if everything is working correctly, it suggests that someone - not necessarily you, it could be the browser vendor or the user mode graphics driver - got something wrong.
Worst-case this spin wait could actually prevent forward progress if something important is waiting in the event loop queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking the device pointer itself, I can add some bool that gets toggled by the RequestDeviceCallback.
If the status received by the callback is anything but successful, then we say that it failed which would then terminate the quoted infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: 11d8ef7
Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too. |
I reckon it was in here: 850caed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left comments on all the obvious stuff I noticed for now.
I'll also note here that cycling hasn't been implemented for any resources.
src/gpu/SDL_gpu.c
Outdated
#ifdef __EMSCRIPTEN__ | ||
SDL_SetHint(SDL_HINT_GPU_DRIVER, "webgpu"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right, we shouldn't be depending on emscripten since webgpu can also have native implementations.
src/gpu/SDL_gpu.c
Outdated
bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0; | ||
|
||
// WebGPU uses ~0u for default layer_or_depth_plane, however this causes issues with other backends | ||
if (color_target_infos[i].layer_or_depth_plane == ~0u && !is_webgpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be translating from SDL to WGPU, not the other way around. If the client passes in ~0u for the layer then that violates our spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: c1d8428
src/gpu/SDL_gpu.c
Outdated
// Get hint to check for "webgpu" | ||
const char *backend = SDL_GetHint(SDL_HINT_GPU_DRIVER); | ||
bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to query hints to get the backend from gpu.c
src/gpu/SDL_sysgpu.h
Outdated
@@ -18,6 +18,7 @@ | |||
misrepresented as being the original software. | |||
3. This notice may not be removed or altered from any source distribution. | |||
*/ | |||
#include "../SDL_internal.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect #include
src/gpu/SDL_gpu.c
Outdated
@@ -20,6 +20,7 @@ | |||
*/ | |||
#include "SDL_internal.h" | |||
#include "SDL_sysgpu.h" | |||
#include <SDL3/SDL_gpu.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect #include
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
.label = "SDL_GPU Command Encoder", | ||
}; | ||
|
||
commandBuffer->commandEncoder = wgpuDeviceCreateCommandEncoder(renderer->device, &commandEncoderDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to pool the command buffer structures than creating a new command encoder every frame.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
int width, height; | ||
SDL_GetWindowSize(renderer->claimedWindows[0]->window, &width, &height); | ||
commandBuffer->currentViewport = (WebGPUViewport){ 0, 0, width, height, 0.0, 1.0 }; | ||
commandBuffer->currentScissor = (WebGPURect){ 0, 0, width, height }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function touching windows? This should be done in BeginRenderPass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to BeginRenderPass. I'll link commit once it's up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: 8d601ec
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
{ | ||
// Just call Submit for WebGPU | ||
WebGPU_Submit(commandBuffer); | ||
// There are no fences in WebGPU, so we don't need to do anything here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having any kind of fence abstraction is going to break tons of applications.
It seems like there's some kind of pseudo-fence callback structure:
https://developer.mozilla.org/en-US/docs/Web/API/GPUQueue/onSubmittedWorkDone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding stuff here as notes for myself when I return:
In the C API, the function is defined as: wgpuQueueOnSubmittedWorkDone(WGPUQueue queue, WGPUQueueWorkDoneCallback callback, void *userdata)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then this can probably be implemented by just having a Fence struct as the userdata and then marking it as finished in the callback.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
// Slightly altered, though with permission by Elie Michel: | ||
// @ https://github.com/eliemichel/sdl3webgpu/blob/main/sdl3webgpu.c | ||
// https://github.com/libsdl-org/SDL/issues/10768#issuecomment-2499532299 | ||
#if defined(SDL_PLATFORM_MACOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be touching platform code in the implementation like this. We'll probably need some kind of platform abstraction in SDL itself that can get a WGPU surface.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
|
||
bool cycleBindGroups; | ||
|
||
WebGPUUniformBuffer vertexUniformBuffers[MAX_UNIFORM_BUFFERS_PER_STAGE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline should not own these, uniform buffers should be pooled.
@@ -0,0 +1,4602 @@ | |||
// File: /webgpu/SDL_gpu_webgpu.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the standard text from https://github.com/libsdl-org/SDL/blob/main/include/SDL3/SDL_copying.h and add any copyright attribution you'd like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: a385d47
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
@@ -2090,6 +2106,11 @@ void WebGPU_BeginRenderPass(SDL_GPUCommandBuffer *commandBuffer, | |||
return; | |||
} | |||
|
|||
int width, height; | |||
SDL_GetWindowSize(wgpu_cmd_buf->renderer->claimedWindows[0]->window, &width, &height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not right, the viewport and scissor should be set to the smallest size of bound render targets. Please reference how the other backends implemented this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I get it now! I'll return to this after some rest I think.
I read up on the Vulkan implementation and will follow that one tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: e24094d
It's still not 1-to-1 with the Vulkan backend but the viewport and scissor now use the smallest available size of all bound render targets.
It also now sets the other default states for the render pass.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
// Note: Compiling SDL GPU programs using emscripten will require -sUSE_WEBGPU=1 -sASYNCIFY=1 | ||
|
||
#include "../SDL_sysgpu.h" | ||
#include "SDL_internal.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDL_internal.h needs to be the first include in the file. I usually throw it right after the standard blurb at the top so I don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: d2fbc02
Note for commit: 30bbcee The abstracted
|
Took a quick look at the latest. In addition to cosmonaut's unresolved review comments, I have a few additional things to mention. First, the high level, non-WebGPU stuff:
For the WebGPU-specific logic:
As a general piece of advice, I highly recommend looking at all 3 backends (Vulkan, D3D12, Metal) when implementing a new system / filling out a new function to see how they each work. Metal is probably better to copy off of than Vulkan since it's a closer match for the higher-level WebGPU API, but please try to understand why we did things a certain way before copying them wholesale into the WebGPU driver. On that note, I strongly encourage you to ask questions if you're not totally sure why something is the way it is. We'll gladly fill you in on the rationale. You do not have to (and should not!) reinvent the wheel for this driver. |
Thanks for the feedback! As for the non-webgpu stuff I was planning on grabbing fresh copies of the upstream files and then force pushing them to fix that the diff issue. As for the WebGPU related things, I'll gladly revert the caching changes since they bloat the tab memory footprint significantly. I still haven't touched the shader binding logic but I took notes based on the shader wiki and have it as the biggest TODO on my list. I'll take a closer look at the Metal backend over the course of my break this week and hopefully get lots done! |
I have started rewriting parts of the backend and have managed to get a few of the examples working again with no issues! I made sure to rewrite all of the shader binding logic so that it would follow the requirements and to my surprise everything worked perfectly. Though we'll see how this works with shaders tougher than a triangle. Blit shaders are next on my list. Unfortunately I'm now encountering issues with texture resource management resulting in segfaults that I can't seem to resolve correctly. A crash can occur via multiple avenues in the BasicStencil example, which leads me to believe that I've made a grave error with texture resource management. BasicStencil can crash at:
The usual suspects in this case are PrepareTextureForWrite(),ReleaseTexture(), PerformPendingDestroys(), and DestroyDevice(). ReleaseTexture() is currently commented out in the file, though it was also causing a memory error, so for now I'm ignoring it and trying to pin down the other issues. It is also important to note that the following functions are usually responsible for the segfaulting.
I feel like I've been going in circles trying to resolve this issue now, so I figure I should ask for some help as I don't want to stray towards another poor implementation. As annoying as this is, I'm not very discouraged as the rest of the rewrite has basically gone off without a hitch. So far ClearScreen, BasicTriangle, BasicVertexBuffer, CullMode, and InstancedIndexed, are all working without any obvious issues. If anyone's got any pointers or feedback I'd be more than happy to hear from you! |
Are you developing using emscripten webgpu? |
That was what I was planning on doing if I couldn't solve the issue by today, though last time I tried this there were a few quirks that had to be worked out when getting it back on the browser. Though those quirks should no longer exist as of a few commits ago, so I guess I'll have to start reinstalling Dawn from source 😔 |
What's your opinion about wgu-native? That project provides pre-built binaries. |
wgpu-native works great and I've used it on another one of my projects! The only unfortunate part is that it doesn't guarantee the code will work fine on the browser since Chrome uses Dawn, and Chrome is by far the most popular in terms of user share. For solving this particular issue wgpu-native might be enough to find a fix though. Thanks! As for WebGPU flavour specific hints, I'm not sure they're necessary. The developer should provide their backend via some build flag since webgpu.h should be backend agnostic. When building for web, using emcc with the -sUSE_WEBGPU flag is enough to link their webgpu library for browser. So adding something like This was the approach I originally used when testing with Dawn. |
Update 02-28Note
So these issues do not occur when running natively via wgpu-native! I'm kind of puzzled, but it seems like I'll be continuing natively for debugging purposes, even if this means I won't have a web build for a bit. Surfaces!As previously mentioned in this thread, we should be handling our surface logic like the other backends do, so I've gotten started on that. For now I've added:
These work properly and only get included if SDL_VIDEO_WEBGPU and SDL_VIDEO_DRIVER_X11 build flags are set. This logic should carry over to all other surface types, I've just gotta put the files together. webgpu.hAnnoyingly enough, Fingers crossed there's an update to their provided version of Here's a video of the current functioning examples on X11 using wgpu-native (so many compression artifacts...) 2025-02-28.06-22-57.1.mp4Final ThoughtsI hope I can get textures and blit pipelines up and running again, and once that's done, I'll be all caught up to where I was! |
Could you move your work over to the branch for this PR so we can see the changes? On your other branch I noticed you introduced an autorelease pool implementation. This is totally unnecessary -- it exists in Metal just because Objective-C ARC requires it to avoid memory leaks. That may very well be the source of the memory corruptions you're seeing in the web implementation. |
Yeah I could definitely try to move the work over! I'll try and replace all the screwed up files too before moving the branch to this PR. I also got rid of the autorelease pool implementation on the web version before hopping over to native which fixed a few issues but still crashed frequently. I'm starting to think the issue was caused by explicitly setting a blend state to NULL as this also caused a few problems in the native version. |
…ckend requirements. Introduces SDL Video subsystem support for WebGPU (x11 only currently), and better binding support.
I have pushed the updated native version of the backend to main and archived the previous version under another branch! This should fix the bad rebase, remove browser support for WebGPU, and introduce native support for X11 using the SDL_Video subsystem. For testing, I am using wgpu-native at the moment, but eventually, the choice of webgpu library should not matter. I believe wgpu-native messes with some of the enum values which is annoying but it shouldn't be a problem as long as you provide the correct header file and library in the I've also updated the checklist at the very top to match the latest version. Gonna keep working on this before I have to study for my next diffies test. |
Why did you remove browser support? That’s one of our main use cases. |
…ed one memory problem down to the pipeline/bindgroup cache.
Feat: Added SPIRV shader support and did a bit of refactoring. Isolat…
Debugging is a nightmare, and the native versions of WebGPU both use the latest version of webgpu.h that Emscripten's bindings don't currently support. Browser support will be added back, but for better dx, I'm gonna stick with wgpu-native because I'm not trapped with browser dev tools. Once I or the Emscripten contributors figure out how to generate new headers for Emscripten to match the latest version, then everything should be good, and browser support should be back on the menu. If not, then we may have to write our own WebGPU bindings (which isn't a bad idea, just sooo much to do). I'll also have to go back and implement surface creation for HTML canvases again but that's not a problem at all. While the demo site is nice to have, it doesn't serve much purpose other than quick testing for everyone else except me. Though I can always go through and add some #ifdefs where necessary for the time being to get support back for the browser. And also, once tested on wgpu-native and Dawn, we'll know the backend will work on both Firefox and Chrome with no issues. |
….c and TexturedQuadAnimated.c examples work now.
Feat: Textures, Samplers, and Bind Group Management
U did a good job. If you need someone to help for debugging. I can help it. |
Description
Congrats on shipping SDL 3.20, and officially releasing SDL3!
Now that SDL3 has been released, I have decided to open a PR for my work for the WebGPU backend as suggested by @flibitijibibo.
Attached is a checklist of the API methods, as well as a checklist of working examples. (As of 2025-01-21).
Examples and more info can be found at: https://github.com/klukaszek/SDL3-WebGPU-Examples
(Based on https://github.com/TheSpydog/SDL_gpu_examples/)
A live demo can be found at: https://kylelukaszek.com/SDL3-WebGPU-Examples/.
My fork currently fails to pass the Emscripten pipeline test for some reason that I haven't taken the time to investigate yet. So that will probably have to be resolved before merging with main.
I'm probably gonna get to work on compute pipelines sometime soon if no one ends up working on that by the time I'm free again.
Shaders
This current implementation of the backend expects WGSL shaders since I have only tested on browsers, and browser implementations of WebGPU don't offer support for the SPIRV SType. Once native WGPU support becomes a priority, then this issue can be tackled.
API Checklist
General
Swapchains
Command Buffers and Fences
Note: WebGPU has no exposed fence API.
Buffers
Textures
Samplers
Debugging
Graphics Pipelines
Compute Pipelines
Shaders
Rendering
Copy Passes
Compute Passes
Fragment Stage
Vertex Stage
Rendering States
Composition
Example Checklist
Native WebGPU Support
See the README for native instructions.
If you are not on X11, you'll have to implement the surface creation and video subsystem support for your window server/platform.
Warnings:
Existing Issue(s)
#10768