Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL3 GPU WebGPU Backend #12046

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

SDL3 GPU WebGPU Backend #12046

wants to merge 7 commits into from

Conversation

klukaszek
Copy link

@klukaszek klukaszek commented Jan 21, 2025

Description

Congrats on shipping SDL 3.20, and officially releasing SDL3!

Now that SDL3 has been released, I have decided to open a PR for my work for the WebGPU backend as suggested by @flibitijibibo.

Attached is a checklist of the API methods, as well as a checklist of working examples. (As of 2025-01-21).

Examples and more info can be found at: https://github.com/klukaszek/SDL3-WebGPU-Examples
(Based on https://github.com/TheSpydog/SDL_gpu_examples/)

A live demo can be found at: https://kylelukaszek.com/SDL3-WebGPU-Examples/.

My fork currently fails to pass the Emscripten pipeline test for some reason that I haven't taken the time to investigate yet. So that will probably have to be resolved before merging with main.

I'm probably gonna get to work on compute pipelines sometime soon if no one ends up working on that by the time I'm free again.

Shaders

This current implementation of the backend expects WGSL shaders since I have only tested on browsers, and browser implementations of WebGPU don't offer support for the SPIRV SType. Once native WGPU support becomes a priority, then this issue can be tackled.

API Checklist

General

  • DestroyDevice
  • SupportsPresentMode
  • ClaimWindow
  • ReleaseWindow

Swapchains

  • SetSwapchainParameters
  • SupportsTextureFormat
  • SupportsSampleCount
  • SupportsSwapchainComposition
  • GetSwapchainTextureFormat

Command Buffers and Fences

  • AcquireCommandBuffer
  • AcquireSwapchainTexture
  • Submit
  • SubmitAndAcquireFence
  • Cancel
  • Wait
  • WaitForFences
  • QueryFence
  • ReleaseFence

Note: WebGPU has no exposed fence API.

Buffers

  • CreateBuffer
  • ReleaseBuffer
  • SetBufferName
  • CreateTransferBuffer
  • ReleaseTransferBuffer
  • MapTransferBuffer
  • UnmapTransferBuffer
  • UploadToBuffer
  • DownloadFromBuffer
  • CopyBufferToBuffer

Textures

Samplers

  • CreateSampler
  • ReleaseSampler

Debugging

  • InsertDebugLabel
  • PushDebugGroup
  • PopDebugGroup

Graphics Pipelines

  • CreateGraphicsPipeline
  • BindGraphicsPipeline
  • ReleaseGraphicsPipeline

Compute Pipelines

  • CreateComputePipeline
  • BindComputePipeline
  • ReleaseComputePipeline

Shaders

  • CreateShader
  • ReleaseShader

Rendering

  • BeginRenderPass
  • EndRenderPass
  • DrawPrimitives
  • DrawIndexedPrimitives
  • DrawPrimitivesIndirect
  • DrawIndexedPrimitivesIndirect

Copy Passes

  • BeginCopyPass
  • EndCopyPass

Compute Passes

  • BeginComputePass
  • EndComputePass
  • DispatchCompute
  • DispatchComputeIndirect
  • BindComputeSamplers
  • BindComputeStorageTextures
  • BindComputeStorageBuffers
  • PushComputeUniformData

Fragment Stage

  • BindFragmentSamplers
  • BindFragmentStorageTextures
  • BindFragmentStorageBuffers
  • PushFragmentUniformData

Vertex Stage

  • BindVertexBuffers
  • BindIndexBuffer
  • BindVertexSamplers
  • BindVertexStorageTextures
  • BindVertexStorageBuffers
  • PushVertexUniformData

Rendering States

  • SetViewport
  • SetScissor
  • SetBlendConstants
  • SetStencilReference

Composition

  • Blit

Example Checklist

  • ClearScreen.c
  • BasicTriangle.c
  • BasicVertexBuffer.c
  • CullMode.c
  • BasicStencil.c
  • InstancedIndexed.c
  • TexturedQuad.c
  • Texture2DArray.c
  • TexturedAnimatedQuad.c
    • Example loads with no warnings, but nothing draws.
    • Needs to be investigated.
  • Clear3DSlice.c,
  • Blit2DArray.c
    • Sampler issue on right texture with no warnings, but draws.
    • Needs to be investigated.
  • BlitCube.c
  • BlitMirror.c
    • Example loads with no warnings, but nothing draws.
    • Needs to be investigated.
  • Cubemap.c
  • CopyAndReadback.c
  • CopyConsistency.c
  • BasicCompute.c
  • ComputeUniforms.c
  • ToneMapping.c
  • CustomSampling.c
  • DrawIndirect.c
  • ComputeSpriteBatch.c
  • TriangleMSAA.c
    • Draws properly, but no visible change occurs when changing the sample count.
    • Needs to be investigated.
  • WindowResize.c (Resizes browser canvas. Have not tested anything natively.
  • GenerateMipmaps.c

Native WebGPU Support

See the README for native instructions.

If you are not on X11, you'll have to implement the surface creation and video subsystem support for your window server/platform.

Warnings:

  • No longer tested for the web! It should not compile for browsers in it's current state due to differences in webgpu.h across different implementations of WebGPU!
  • There's a memory issue caused by the cycling of bind groups. This can be seen in a crash in the CullMode example, where we cycle through the cull modes and risk crashing there or crashing when changing examples due to memory corruption.

Existing Issue(s)

#10768

@slouken slouken added this to the 3.4.0 milestone Jan 21, 2025
@slouken
Copy link
Collaborator

slouken commented Jan 21, 2025

Congrats on the awesome progress!

Comment on lines 1411 to 1418
while (SDL_GetAtomicInt(&buffer->mappingComplete) != 1) {
if (SDL_GetTicks() - startTime > TIMEOUT) {
SDL_LogError(SDL_LOG_CATEGORY_GPU, "Failed to map buffer: timeout");
return NULL;
}

SDL_Delay(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spin-wait is a huge red flag. Generally speaking browser async operations should not be implemented this way. I would be very concerned that this will break on certain targets since generally async stuff on the web is specified to not be observable until the event loop turns; if this happens to work it could break in the future and nobody would know what was going on.

At a minimum you should have a comment here that specifies why it's safe/appropriate to do this instead of doing something else (I don't know what else you'd do offhand) - i.e. 'here's the part of the WebGPU spec that says this is legal and the spin should complete quickly' or 'i tested this on and on and '.

Thankfully this appears to only apply to readback which makes it have less of an impact on the overall API; it might be that what you need to do is specify an async readback API extension to SDL_GPU and make that the only legal way to do readback on the WebGPU target.

Blocking the browser's main thread (for up to 1000ms in this case) is very bad. It causes all sorts of downstream problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll throw some comments in! I'll also have to add some preprocessor macros to ensure that SDL_Delay(1) calls are specific to Emscripten. This is done since browser backends for WebGPU don't give access to device ticking, so we have to yield back to the browser for a tiny amount of time for the backend to tick the device for us.

Here's a quote from Elie Michel:

"When our C++ code runs in a Web browser (after being compiled to WebAssembly through emscripten), there is no explicit way to tick/poll the WebGPU device. This is because the device is managed by the Web browser itself, which decides at what pace polling should happen. As a result:

The device never ticks in between two consecutive lines of our WebAssembly module, it can only tick when the execution flow leaves the module.

The device always ticks between two calls to our MainLoop() function, because if you remember the Emscripten section of the Opening a Window chapter, we leave the main loop management to the browser and only provide a callback to run at each frame.

Thanks to the second point, we do not need wgpuPollEvents to do anything when called at the beginning or end of our main loop (so we set yieldToWebBrowser to false).

However, if what we intend is really to wait until something happens (e.g., a callback gets invoked), the first point requires us to make sure we yield back the execution flow to the Web browser, so that it may tick its device from time to time. We do this thanks to emscripten_sleep function, at the cost of effectively sleeping during 100 ms (we’re in a case where we want to wait anyways).

Note that using emscripten_sleep requires the -SASYNCIFY link option to be passed to emscripten, like we added already."

Copy link
Collaborator

@thatcosmonaut thatcosmonaut Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify an async readback API extension to SDL_GPU

We have an async readback API, it's the Download and QueryFence/WaitForFence functions. If the committee can't define their specification for this extremely common use case in a normal way like every single industry-standard API going back to D3D11 that is firmly their problem. I would rather force the webGPU backend to implement a hack to make it work our way than poison our API with something as stupid as an async buffer map call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so because you're relying on asyncify being set (I missed this, sorry! my bad) the sleep is not a spinwait but is instead a yield-to-browser-event-loop. That's much better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I'm not only being grumpy in this thread, here's a quick sketch of how this could possibly work:

A "fence" in the webGPU backend could just be defined as a group of resources that are waiting on async map operations. Then implementing QueryFence would be as simple as checking buffer->mappingComplete for each of these resources. WaitForFence could be implemented with the spinwait. That might be enough for this to work.

Comment on lines 4483 to 4326
while (!renderer->device) {
SDL_Delay(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a forward progress guarantee here? Please specify what provides the guarantee of forward progress. A naive reading of this suggests that it might never stop spinning since there's no timeout. It would be nice to at least see a timeout here and have it error out when the timeout expires.

It would be even better to not have this spin-wait. It's a red flag and doesn't seem like it should be necessary if everything is working correctly, it suggests that someone - not necessarily you, it could be the browser vendor or the user mode graphics driver - got something wrong.

Worst-case this spin wait could actually prevent forward progress if something important is waiting in the event loop queue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking the device pointer itself, I can add some bool that gets toggled by the RequestDeviceCallback.

If the status received by the callback is anything but successful, then we say that it failed which would then terminate the quoted infinite loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: 11d8ef7

@thatcosmonaut
Copy link
Collaborator

Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too.

@klukaszek
Copy link
Author

Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too.

I reckon it was in here: 850caed

Copy link
Collaborator

@thatcosmonaut thatcosmonaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left comments on all the obvious stuff I noticed for now.

I'll also note here that cycling hasn't been implemented for any resources.

Comment on lines 530 to 532
#ifdef __EMSCRIPTEN__
SDL_SetHint(SDL_HINT_GPU_DRIVER, "webgpu");
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right, we shouldn't be depending on emscripten since webgpu can also have native implementations.

bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0;

// WebGPU uses ~0u for default layer_or_depth_plane, however this causes issues with other backends
if (color_target_infos[i].layer_or_depth_plane == ~0u && !is_webgpu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be translating from SDL to WGPU, not the other way around. If the client passes in ~0u for the layer then that violates our spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: c1d8428

Comment on lines 1416 to 1418
// Get hint to check for "webgpu"
const char *backend = SDL_GetHint(SDL_HINT_GPU_DRIVER);
bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to query hints to get the backend from gpu.c

@@ -18,6 +18,7 @@
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
*/
#include "../SDL_internal.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect #include

@@ -20,6 +20,7 @@
*/
#include "SDL_internal.h"
#include "SDL_sysgpu.h"
#include <SDL3/SDL_gpu.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect #include

.label = "SDL_GPU Command Encoder",
};

commandBuffer->commandEncoder = wgpuDeviceCreateCommandEncoder(renderer->device, &commandEncoderDesc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to pool the command buffer structures than creating a new command encoder every frame.

int width, height;
SDL_GetWindowSize(renderer->claimedWindows[0]->window, &width, &height);
commandBuffer->currentViewport = (WebGPUViewport){ 0, 0, width, height, 0.0, 1.0 };
commandBuffer->currentScissor = (WebGPURect){ 0, 0, width, height };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function touching windows? This should be done in BeginRenderPass.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to BeginRenderPass. I'll link commit once it's up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: 8d601ec

{
// Just call Submit for WebGPU
WebGPU_Submit(commandBuffer);
// There are no fences in WebGPU, so we don't need to do anything here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having any kind of fence abstraction is going to break tons of applications.

It seems like there's some kind of pseudo-fence callback structure:
https://developer.mozilla.org/en-US/docs/Web/API/GPUQueue/onSubmittedWorkDone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding stuff here as notes for myself when I return:

In the C API, the function is defined as: wgpuQueueOnSubmittedWorkDone(WGPUQueue queue, WGPUQueueWorkDoneCallback callback, void *userdata).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then this can probably be implemented by just having a Fence struct as the userdata and then marking it as finished in the callback.

// Slightly altered, though with permission by Elie Michel:
// @ https://github.com/eliemichel/sdl3webgpu/blob/main/sdl3webgpu.c
// https://github.com/libsdl-org/SDL/issues/10768#issuecomment-2499532299
#if defined(SDL_PLATFORM_MACOS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be touching platform code in the implementation like this. We'll probably need some kind of platform abstraction in SDL itself that can get a WGPU surface.


bool cycleBindGroups;

WebGPUUniformBuffer vertexUniformBuffers[MAX_UNIFORM_BUFFERS_PER_STAGE];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline should not own these, uniform buffers should be pooled.

@@ -0,0 +1,4602 @@
// File: /webgpu/SDL_gpu_webgpu.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the standard text from https://github.com/libsdl-org/SDL/blob/main/include/SDL3/SDL_copying.h and add any copyright attribution you'd like here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: a385d47

@@ -2090,6 +2106,11 @@ void WebGPU_BeginRenderPass(SDL_GPUCommandBuffer *commandBuffer,
return;
}

int width, height;
SDL_GetWindowSize(wgpu_cmd_buf->renderer->claimedWindows[0]->window, &width, &height);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not right, the viewport and scissor should be set to the smallest size of bound render targets. Please reference how the other backends implemented this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I get it now! I'll return to this after some rest I think.

I read up on the Vulkan implementation and will follow that one tomorrow.

Copy link
Author

@klukaszek klukaszek Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: e24094d

It's still not 1-to-1 with the Vulkan backend but the viewport and scissor now use the smallest available size of all bound render targets.

It also now sets the other default states for the render pass.

// Note: Compiling SDL GPU programs using emscripten will require -sUSE_WEBGPU=1 -sASYNCIFY=1

#include "../SDL_sysgpu.h"
#include "SDL_internal.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDL_internal.h needs to be the first include in the file. I usually throw it right after the standard blurb at the top so I don't forget.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link: d2fbc02

@klukaszek
Copy link
Author

Note for commit: 30bbcee

The abstracted WebGPUCommandBuffer struct is pooled in the hash table, but the underlying GPUCommandEncoder struct must be created per frame due to WebGPU limitations.

GPUCommandEncoder is consumed and cannot be reused once finish() is called on the encoder. finish() is what returns the GPUCommandBuffer that we need.

@TheSpydog
Copy link
Collaborator

Took a quick look at the latest. In addition to cosmonaut's unresolved review comments, I have a few additional things to mention.

First, the high level, non-WebGPU stuff:

  • Please undo all your whitespace changes for SDL_gpu.h and SDL_gpu.c. They make the actual meat of the diff hard to parse. We can make any formatting tweaks in a separate PR.
  • In fact, all of the changes in SDL_gpu.c should be reverted except for adding the WebGPUDriver to the driver list, and the WGSL hint logic. If any of the remaining changes are needed as hacks to temporarily work around something broken, you can denote them with // HACK so we know what we're looking at during a review.
  • I can't tell if SDL_emscriptenevents.c was changed intentionally or if a rebase borked it somehow? If some changes were intentional, they should be split off into a separate PR since they're not directly GPU related.
  • It looks like SDL_video.c was only changed to add a debug log. It's fine to add SDL_Log("whatever") for your local testing, but please delete these kinds of logs before committing.

For the WebGPU-specific logic:

  • If webgpu.h is needed for compilation (or LSP support), we should probably include it in the repo directly like we do with the Khronos headers.
  • GenerateMipmaps can be implemented via blits. See the D3D12 implementation.
  • Most of the hash+caching that Vulkan does (for render passes, etc.) is purely to work around Vulkan-specific API quirks. WebGPU shouldn't need any of this.
  • The per-thread command pools are not necessary. They're needed for Vulkan since it has threading restrictions around command pool creation and command buffer creation, but WebGPU creates encoders directly from the device, so that shouldn't be needed. You can probably copy off the the Metal implementation, since it and the WebGPU APIs are pretty similar.
  • As cosmonaut said, the binding model here (using shader reflection) is completely wrong. It would probably be best to delete all the existing binding-related logic and rewrite it from scratch. I would recommend doing that sooner rather than later. It's less exciting than ticking off a box for a new function, but this is a massively fundamental part of the driver, and we cannot merge this without this completely reworked.

As a general piece of advice, I highly recommend looking at all 3 backends (Vulkan, D3D12, Metal) when implementing a new system / filling out a new function to see how they each work. Metal is probably better to copy off of than Vulkan since it's a closer match for the higher-level WebGPU API, but please try to understand why we did things a certain way before copying them wholesale into the WebGPU driver.

On that note, I strongly encourage you to ask questions if you're not totally sure why something is the way it is. We'll gladly fill you in on the rationale. You do not have to (and should not!) reinvent the wheel for this driver.

@klukaszek
Copy link
Author

Thanks for the feedback!

As for the non-webgpu stuff I was planning on grabbing fresh copies of the upstream files and then force pushing them to fix that the diff issue.

As for the WebGPU related things, I'll gladly revert the caching changes since they bloat the tab memory footprint significantly.

I still haven't touched the shader binding logic but I took notes based on the shader wiki and have it as the biggest TODO on my list.

I'll take a closer look at the Metal backend over the course of my break this week and hopefully get lots done!

@klukaszek
Copy link
Author

I have started rewriting parts of the backend and have managed to get a few of the examples working again with no issues!

I made sure to rewrite all of the shader binding logic so that it would follow the requirements and to my surprise everything worked perfectly. Though we'll see how this works with shaders tougher than a triangle. Blit shaders are next on my list.

Unfortunately I'm now encountering issues with texture resource management resulting in segfaults that I can't seem to resolve correctly.

A crash can occur via multiple avenues in the BasicStencil example, which leads me to believe that I've made a grave error with texture resource management.

BasicStencil can crash at:

  • Example start
  • Any frame
  • Example exit
  • Or sometimes there's no crash at all!

The usual suspects in this case are PrepareTextureForWrite(),ReleaseTexture(), PerformPendingDestroys(), and DestroyDevice().

ReleaseTexture() is currently commented out in the file, though it was also causing a memory error, so for now I'm ignoring it and trying to pin down the other issues.

It is also important to note that the following functions are usually responsible for the segfaulting.

  • SDL_calloc()
  • SDL_free() (I assume double free)
  • SDL_GetAtomicInt().

I feel like I've been going in circles trying to resolve this issue now, so I figure I should ask for some help as I don't want to stray towards another poor implementation.

As annoying as this is, I'm not very discouraged as the rest of the rewrite has basically gone off without a hitch.

So far ClearScreen, BasicTriangle, BasicVertexBuffer, CullMode, and InstancedIndexed, are all working without any obvious issues.

If anyone's got any pointers or feedback I'd be more than happy to hear from you!

@madebr
Copy link
Contributor

madebr commented Feb 26, 2025

Are you developing using emscripten webgpu?
Would it make sense to add support for desktop webgpu using wgpu/dawn?
Perhaps those provide a better debugging experience?

@klukaszek
Copy link
Author

That was what I was planning on doing if I couldn't solve the issue by today, though last time I tried this there were a few quirks that had to be worked out when getting it back on the browser. Though those quirks should no longer exist as of a few commits ago, so I guess I'll have to start reinstalling Dawn from source 😔

@madebr
Copy link
Contributor

madebr commented Feb 26, 2025

so I guess I'll have to start reinstalling Dawn from source 😔

What's your opinion about wgu-native? That project provides pre-built binaries.
Perhaps sdl3's webgpu gpu drivershould be structured in a way to support multiple backends (as a hint)? emscripten <-> wgpu_native <-> dawn? (I don't know how hard this is. sorry :) )
This would be useful for any developer wanting to port their gpu app to the web.

@klukaszek
Copy link
Author

wgpu-native works great and I've used it on another one of my projects! The only unfortunate part is that it doesn't guarantee the code will work fine on the browser since Chrome uses Dawn, and Chrome is by far the most popular in terms of user share.

For solving this particular issue wgpu-native might be enough to find a fix though. Thanks!

As for WebGPU flavour specific hints, I'm not sure they're necessary. The developer should provide their backend via some build flag since webgpu.h should be backend agnostic. When building for web, using emcc with the -sUSE_WEBGPU flag is enough to link their webgpu library for browser. So adding something like -DSDL_WEBGPU_LIBRARY=path/to/webgpu/ might do the trick when building with CMake for native platforms.

This was the approach I originally used when testing with Dawn.

@klukaszek
Copy link
Author

Update 02-28

Note

Unfortunately, I'm now encountering issues with texture resource management resulting in segfaults that I can't seem to resolve correctly.

A crash can occur via multiple avenues in the BasicStencil example, which leads me to believe that I've made a grave error with texture resource management.

So these issues do not occur when running natively via wgpu-native! I'm kind of puzzled, but it seems like I'll be continuing natively for debugging purposes, even if this means I won't have a web build for a bit.

Surfaces!

As previously mentioned in this thread, we should be handling our surface logic like the other backends do, so I've gotten started on that. For now I've added:

  • src/video/x11/SDL_x11webgpu.c
  • src/video/x11/SDL_x11webgpu.h

These work properly and only get included if SDL_VIDEO_WEBGPU and SDL_VIDEO_DRIVER_X11 build flags are set.

This logic should carry over to all other surface types, I've just gotta put the files together.

webgpu.h

Annoyingly enough, webgpu.h for Emscripten is currently a bit behind the upstream version of webgpu.h to avoid frequent breaking changes. The Emscripten WebGPU Readme mentions that you should be able to update the bindings via some codegen shenanigans and a bit of manual tweaking. The only issue is that these instructions are meaningless due to a supposed problem with Chromium Code Search. I'm still gonna explore a bit and try my luck, but I don't have very high hopes.

Fingers crossed there's an update to their provided version of webgpu.h sometime soon. If not, I'll have to go through and litter the code with #ifdefs because of some changes to how WebGPU handles strings and callbacks in newer versions of webgpu.h.

Here's a video of the current functioning examples on X11 using wgpu-native (so many compression artifacts...)

2025-02-28.06-22-57.1.mp4

Final Thoughts

I hope I can get textures and blit pipelines up and running again, and once that's done, I'll be all caught up to where I was!

@TheSpydog
Copy link
Collaborator

Could you move your work over to the branch for this PR so we can see the changes?

On your other branch I noticed you introduced an autorelease pool implementation. This is totally unnecessary -- it exists in Metal just because Objective-C ARC requires it to avoid memory leaks. That may very well be the source of the memory corruptions you're seeing in the web implementation.

@klukaszek
Copy link
Author

Yeah I could definitely try to move the work over! I'll try and replace all the screwed up files too before moving the branch to this PR.

I also got rid of the autorelease pool implementation on the web version before hopping over to native which fixed a few issues but still crashed frequently.

I'm starting to think the issue was caused by explicitly setting a blend state to NULL as this also caused a few problems in the native version.

…ckend requirements. Introduces SDL Video subsystem support for WebGPU (x11 only currently), and better binding support.
@klukaszek
Copy link
Author

klukaszek commented Mar 3, 2025

I have pushed the updated native version of the backend to main and archived the previous version under another branch!

This should fix the bad rebase, remove browser support for WebGPU, and introduce native support for X11 using the SDL_Video subsystem.

For testing, I am using wgpu-native at the moment, but eventually, the choice of webgpu library should not matter. I believe wgpu-native messes with some of the enum values which is annoying but it shouldn't be a problem as long as you provide the correct header file and library in the wgpu/ directory. For now I have provided the webgpu.h that is given out with wgpu-native, and this header is referenced through <SDL3/SDL_webgpu.h>.

I've also updated the checklist at the very top to match the latest version.

Gonna keep working on this before I have to study for my next diffies test.

@slouken
Copy link
Collaborator

slouken commented Mar 3, 2025

Why did you remove browser support? That’s one of our main use cases.

klukaszek and others added 2 commits March 3, 2025 02:27
…ed one memory problem down to the pipeline/bindgroup cache.
Feat: Added SPIRV shader support and did a bit of refactoring. Isolat…
@klukaszek
Copy link
Author

klukaszek commented Mar 3, 2025

Why did you remove browser support? That’s one of our main use cases.

Debugging is a nightmare, and the native versions of WebGPU both use the latest version of webgpu.h that Emscripten's bindings don't currently support. Browser support will be added back, but for better dx, I'm gonna stick with wgpu-native because I'm not trapped with browser dev tools.

Once I or the Emscripten contributors figure out how to generate new headers for Emscripten to match the latest version, then everything should be good, and browser support should be back on the menu. If not, then we may have to write our own WebGPU bindings (which isn't a bad idea, just sooo much to do). I'll also have to go back and implement surface creation for HTML canvases again but that's not a problem at all.

While the demo site is nice to have, it doesn't serve much purpose other than quick testing for everyone else except me. Though I can always go through and add some #ifdefs where necessary for the time being to get support back for the browser.

And also, once tested on wgpu-native and Dawn, we'll know the backend will work on both Firefox and Chrome with no issues.

@charles-liang
Copy link

U did a good job. If you need someone to help for debugging. I can help it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants