-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Porting from 2.88.6 to 3.0.0-preview2.1 - "Access Violation on Finalizer/Dispose"- SKPaint object #2794
Comments
NOTE: ONCE, as I re-ran this many times, it crashed on the GLControl Paint event thread, while running this line of code:
Same error, "Access Violation Exception". |
And so, I wrote specialized Property Getter/Setter for my "Font" class which ensures that no SKPaint objects ever get Finalized (which is something I'm hoping to avoid worrying about). And so now the "Access Violation" now only happens on my call to "Paint.BreakText(...)":
So I set a watch on 'Paint' property, and the watch gets 3 "AccessVioloationExceptions" inside of it, as shown here: |
NOTE - if I set a BreakPoint, to change the "timing" - the line of code that throws this Access Violation exception can be changed, such as this last time it crashed on this line of code:
(where OutlinePaint itself is throwing AccessViolations on some properties). NOTE - that it successfully uses these Paint objects many many times just fine (and rapidly) - and then it randomly breaks (as though something were disposed by a finalizer, is what it could be). |
Hi, Could you please tell me how did you make code working? Did you just install SkiaSharp 3.0.0-preview2.1 nuget package? I have installed it but I get Access Violation exception. Did you also build and install latest Angle library? If possible, could you please list down the steps you followed for code migration? Thanks in advance, |
I just changed my NUGET references to the various SkiaSharp libraries over to the preview 3.0, for 3 packages as shown here: And then it compiled great (after a few code fixes)... but crashes on the 'Finalizers' it seems. I do NOT have (nor ever had) Angle nuget referenced by my project. But I do have OpenTK 3.3.3 packages as follows: Other than those, I have System.XXXX libraries, and Xamarain.Essentials 1.8, and NetonSoft.Json 13.0.3 and EntityFramework 6.4.4. That's it.. So the ONLY change I made in migration to NUGET was those THREE SkiaSharp packages shown above. |
Thanks for quick reply. I think, I misunderstood. Is your application developed in WPF? My application is in WinUI 3. |
It runs on WPF for Windows, but Xamarin Forms for Android/iOS. Windows/WPF is the easiest place to get things working, so am starting here. Then will move on to Android/iOS w/Xamarin.Forms. |
any updates? workarounds? or at least MCVE? so far crashes look randomly. |
Are you able to reproduce this in a sample app? I am not sure what your code looks like and it may have disposed the typeface manually somewhere. Same with the image. There may have been a case of you passing it to an object that takes over the life of the image and I was unaware. One example that I know offhand of this is when you create a pdf document, the stream it is writing to is now owned by the document and can me disposed in native code at any time. There may be some api somewhere that is doing this for image and typeface. I know this is me saying compress your entire app into a single line and send it over, but maybe you can try disabling chunks of the drawing code and see which part of the render is triggering a crash. |
Also, I may have missed it, is this full or dotnet "core"? |
I just had an idea while taking a nap. The paint object is a lie! Sort of. In the new versions of skia, the paint object does not have any of the font members. They were moved to skfont. You probably have many, but what happens if you switch to use skfont and none of the font members, like typeface and breaktext, on the paint and instead use a font object. I wonder if I have a life cycle bug in where the secret font instance of paint is being disposed. Instead of using paint.breaktext and canvas.drawtext with paint, use font.breaktext and canvas.drawtext that uses paint and font. When I get back to my machine I will check some things. |
Our windows project is .NET Framework 4.8, but on the verge of being ported to .NET 8. On Android/iOS - we are full .NET 8. This crash is happening on Windows, I haven't tried it yet on Android or iOS. I'll do that next, to provide you with some added information. The SKImage that is having issues is one that we NEVER have a reference to... it's created via "Canvas.DrawBitmap" which creates an SKImage via "using" statement, and so is auto-disposed. The specific 258x258 pixel size of this SKImage lets me know who created it, which is only one place in our code (indirectly by calling SKCanvas.DrawBitmap())". So the issue here, for these, is 'Dispose()' is called implicitly/immediately via the "using" context. But when GC kicks in later, calling the SKImage Finalizer, it's trying to access invalid memory via the Skia wrapper. In my mind, it seems that SKObject shouldn't be trying to access the raw SKIA object once Dispose() is called. Shouldn't Dispose() be deleting the native SKIA SKImage already? If so, then this crash seems expected to me... as we're trying to access it again, later, during the Finalize phase. |
Our use of SKPaint for Fonts is tightly wrapped in our own "iFont" class (iFlyEFB-Font) - and for these, we strictly make them all Static references, via our "iFonts" static classes (i.e. iFonts.Tahoma18Bold)... so each SKPaint used for Fonts is created at startup, with readonly static references. We apply these Fonts to BOTH contexts that we have in our system. Due to 2.88.x limitation of only having ONE GL Context, we employ GL Context for our Main Map (the place where we render at 30 FPS), while for all UI Popup forms employ the non-GL context. (Not sure yet, if this limitation to a singular GL context remains for 3.116.x) So just letting you know that we reuse these static iFont.SKPaint instances for both contexts - and it seems to be working well. For the SKImages being disposed, there is no use of iFont on them. We are Loading a PNG image from stream, ===
So we've preloaded our Tile Bitmaps to BMP-style PixelBuffers in RAM. And use "InstallPixels()" to make your SKBitmap use these pixels, and it works great. Then draw this Bitmap to an SKSurface. The SKImage culprit is created by your DrawBitmap wrapper method. We don't do much creation of SKImages in our app, and no where else do we employ 258x258 size images, that are specifically sized for our Tile rendering. If I go to a different map mode (vector mode vs. this tile mode), no such crash occurs for SKImage. |
It crashes on Android as well. In this context we are only using GL for one task - SKIA. Here is our Crash Log, associated with GL thread following a GC. |
Overall, Android runs more stable with this than Windows (at least on my main Android OS 12 Galaxy S10 test device). I can make it crash on windows within a few minutes, but on Android, it's 10's of minutes of stressing it. So this Disposal/Finalizer-related crash appears to be a rare occurrence, which of course makes it harder to debug/fix, yet still renders this bug critical for us. Our users are flying airplanes, and cannot tolerate an app crash, requiring restart. |
Using Skia 2.88.6, I was able to produce the same exact SKImage crash on the same SKImage Finalizer calling Dispose(), producing an unauthorized access violation. So this crash isn't new to version 3.116.1. EDIT - and it happened just now for 2.88.9. We've had this crashing bug for 1.5 years now, and this week, just figured out it's source. It's a bug that typically happens about 30-60 minutes into usage, so has been elusive to us, and we just figured out the use cases that accelerate it, so that we can now cause it to happen in < 3 minutes usually now. We've slowed it down in the past by holding a reference to the SKImage (up to 120 of them!) - so that they won't dispose, and have chance to cause this app crash. |
Since your "Dispose(disposing)" method "does it all" - I'm going to test the method calling "Dispose()" manually, then GC.SuppressFinalizer(skImg).... but this fails even worse. === Our SKIA method for doing this is:
We only need this problematic SKImage to pump our pixels verbatim to the SKSurface. It's a 1:1 mapping, no change in the pixels. ==== This may resolve the issue for us. Plus is more efficient, as we can skip this "meaningless middleman - SKImage" since we are just wanting to do a direct PixMap dump to the surface. |
I wonder if the crash is related to the case where an image is copied to the GPU and then the GPU context changes/resets. But also, I wonder if it is the case that there is no one holding onto the bitmap... Looking at the API in C++ for the image-from-bitmap: SK_API sk_sp<SkImage> RasterFromBitmap(const SkBitmap& bitmap); The bitmap is a reference - not a smart/counted pointer. So I wonder when you draw the bitmap, the image is created but does not actually retain the managed bitmap, which is collected by the GC but the GPU still tries to draw it... I am not 100% sure, I will try and reproduce it here and see if I can track the life of the actual pixel data, but I wonder if loading the pixels directly into an SKImage will help. Can you try loading the pixels into the SKImage directly? There is a
So, the best mechanism is all depending on how your data exists. If you have control of the loading, maybe load as a If you have a stream, then If you have a byte array, there is the easy copy way using var bytes = ...; // from somewhere
var gch = GCHandle.Alloc(bytes, GCHandleType.Pinned); // pin memory so the GC doesn't move it
var info = new SKImageInfo(width, height);
var pixmap = new SKPixmap(info, gch.AddrOfPinnedObject());
var image = SKImage.FromPixels(pixmap, (addr, _) => GCHandle.FromIntPtr(addr).Free()); But, before doing anything fancy, try swapping to use |
OK - I tried the simple fix, and it fails just-the-same.
My data is pinned inside of a large data block on the large object heap (we do our own allocations, given we have a pre-defined pool of 120 blocks of 258x258x4 for each Tile image pixels. Each tile is assigned an unmoving IntPtr to the start of it's pixel buffer. This is what is sent in above as "...Image.pBits". It works the exact same as using the SkBitmap, regarding how well it works for a couple minutes, then crashes on the Finalizer, exactly the same way, in about the same amount of time. === Our pipeline is this:
=== Our code to draw to the Screen surface is like this:
From there this sampler becomes a Child shader of a custom shader, which combines the TileMap with an Elevation Map, so that we can color code the map as "Yellow/Red" to show Higher elevations that pose a collision risk to lower flying aircraft. This attached video, shows how we colorize the tiles with Red/Yellow, and enrich the tilemap color to make sure details are not washed out. It's all working GREAT, except for this stability issue, which has been a real sore spot: SkiaThrobTiles.mp4 |
I am wondering how the image is getting collected if you have a reference to it. If you can see the value of the handle in the finalizer, is it the same as the ones that you know you created? Maybe it is the snapshot one. That creates an image as well. Can you try doing skcanvas.drawsurface and see if it still happens? I wonder if the shader needs to take ownership/strong ref the snapshot... I suppose you can try adding it to a static list so it also never gets collected. |
NOTE: In no circumstance was I holding onto a SKImage reference, for longer than a few extra seconds (the longest I was delaying it was the time when the tile itself became recycled -- so only holding onto 120 SKImg references at a time, max). In my experiment above, I was calling "img.Dispose()" then "GC.SuppressFinalizer(img)", then set "img = null" to make it fall out of scope quickly, Disposed() but never to be finalized. This crashed even faster, as you might predict. Code for this failed experiment:
=====
They DIFFER every time!... So a copy is being made. To Validate this result: Every time one is created, I put it into a SortedList<Handle, string>, so that I can review results at the time of the crashing exception. I'm typically able to load 15K to 45K tiles before it crashes (about 100 to 300 pinch operations equivalent, but also triggered by panning) -- this is why it crashes randomly between 20-60 minutes into usage.
I'm not sure what you want for me to do here. I've got a Bitmap to draw to an existing surface. Do you want me to create a new Temp surface each time I load a tile? And how do you want me to insert this into my pipeline for sake of doing this test?
If I never let it get collected - we'll get an OOM fast. A single pinch operation can load 150 tiles, as we animate each level of detail as you zoom in/out. Each tile is 258x258x4 bytes = 0.25 MB. So collecting 40 MB of never-to-be-used-again pointers is not an option. I can do this for sake of showing that it doesn't crash until the OOM, if you like. FINAL NOTE: |
MORE INFO. I have found a way to make the crash less likely to occur.
I accomplish #4 by having two static List to hold the references. The every "X" msec, I swap the lists, and clear the other list. The "other list" has references that were created between "X" and "2X" msec ago. This influences the crashing rate results significantly. New Crash when I do this: If I precede my "Surface.Canvas.DrawImage()" with a "...Clear(transparent)" -- the crash still happens on the "Clear()" -- so the issue is associated with the Surface, not the SKImg here. For each Tile, we are creating just one Surface, and reusing it. 120 Surfaces in all - as our Tile Pool size is 120. My rapid testing is loading/drawing about 15,000 tiles per minute. ==== === If I set 'X' longer, it crashes much soon, and as 'X' gets even longer, it begins to crash on my Surface.Draw/Clear() calls. Too short - and it doesn't help much. 250-500 msec for my context seems to be the sweet-spot where it runs about 2x as long as before, without crash - average. === |
I wonder if the surface is holding onto the image internally, and then you dispose it making the surface invalid. Is it possible to have a test where you never dispose the image and keep it around forever? But, I wonder if this is a GPU surface, you will need to flush the surface to actually push the image to the surface and not queue it up? GPU commands are not immediate, so the delay may actually be you allowing the commands to be flushed. I also wonder if there is anything on But, maybe I am not understanding something as well. You have your main screen surface, and then smaller surfaces for each tile, and then an image that you are drawing on that smaller surface? I also see shaders and snapshots. I feel like my brain is saying you are loading an image, drawing it to a surface, then creating a snapshot and then making a shader? Can you not draw directly to the main surface? Or even use the |
Are you able to see the handle of the crash in the finalizer vs the handle in the ones you make? If there is an image that you create and then dispose that is then later being collected - something is really wrong. Dispose also calls It has to be the surface snapshot that may still be in use. |
Please! Help me keep my sanity! 😆 |
YES, this sounds very promising! I'm just feeding, as I wasn't tracking that Snapshot at all. My call looks like this:
Now changing it to this:
If I drive you insane, we'll pay for your in-patient asylum medical costs. We need to take care of you, just as you have taken such great care of us. This latest guess makes a ton of sense... I didn't realize this Snapshot() was creating the SKImage too. Doh. Running tests now and will send you the results, within 30 minutes. |
EVERYTHING LOOKING SUPER DUPER NOW! THANK YOU! (as I write, we're at 400K tiles loaded/drawn/Snapshotted, and counting..) We have about a dozen places in our code where we do this "Snapshot().ToShader()" without calling Dispose() on the Snapshot SKImage. So we're going to create an SKUtil static method called "CreateShaderFromSurface(surface)" to return the SKShader, and will be sure to dispose of the Snapshot Image via "using" context. ==== Now at 440K tiles and counting.... (Previous record achieved via GC.Collect kludge was 167K, and without GC.Collect, record was 85K tiles, and typical was 40K-60K). So I think we have a clear winner here! Thanks! Where do we send the money? |
I wonder why you need an explicit Dispose on the image tho... Maybe to release it in C# first before the shader does its thing. I feel like the image should be OK with the shader being collected. If the shader is making a copy of the image, then why should it matter if the Dispose or the GC is collecting things... Does any of this happen with cpu rendering? |
For future me when I get to my windows machine... Trying to repro the code paths here, but not sure of the steps. [SkippableFact]
public unsafe void ImageSnaps()
{
var (weak, shader) = DoWork();
CollectGarbage();
Assert.False(weak.IsAlive);
shader.Dispose();
static (WeakReference Weak, SKShader Shader) DoWork()
{
var image = SKImage.FromEncodedData(Path.Combine(PathToImages, "baboon.jpg"));
var surface = SKSurface.Create(new SKImageInfo(image.Width, image.Height));
surface.Canvas.DrawImage(image, 0, 0);
image.Dispose();
var snapshot = surface.Snapshot();
surface.Dispose();
var shader = snapshot.ToShader();
return (new WeakReference(snapshot), shader);
}
} |
I'm not sure. But I bet you are going to find out.
Not that we're seeing. Our CPU rendering is much lighter weight. |
I'm going to create a new issue called "please expose SkCanvas.WritePixels()" - to avoid the SKImage middleman. My impression of that API is that it saves the middleman copy of our pixelBuffer, and skips the use of a Shader -- just does a Blit. We do a TON of this, including this Raster Map tile example that we just resolved. Another feature of our is Animating Weather (images) -- we have the images in RAM in "code-runLength" quick compression mode, which we then send to the GPU to render the resulting Image. So as we animated, currently, these images are decompressed JIT, into a fixed Pixel Buffer (the same one for all frames), and then we are creating SKImage from it, each frame, and then DrawImage to our render surface. (which is a full 1:1 copy of the image)... Then our render pipeline Draws this full Image (enscribed onto the Surface) onto the final Screen Render Surface (using Matrix manipulations to scale/rotate/translate it)... It would be nice to just go from "PixBuffer" to "Surface" each frame without creating this SKImage chaff as middleman. EDIT: Added it here: |
Why would you need a shader and not use the image directly? You can copy to the GPU using Also, you don't need to copy the pixel data, just use the I saw these discussions:
Basically, the images are cached on the GPU, so you can try to increase the cache limits and let skia handle some caching. But, for more control, you could convert the image into a texture image and then just draw that directly. |
This is what we're doing now to create the SKImage, that is then blitted to our surface.
Our Tile Renderer is a shader, because we've encoded the "heightmap" into the tile image's alpha channel. And our shader makes use of a "cOwnshipAltitudeFt" uniform value, so that we can Colorize it as Yellow or Red for terrain that is a Collision hazard. We need a custom shader to produce this colorization/enhancement effect in real-time (as the "cOwnshipAltitudeFt" can vary frame to frame - the colorization logic is 'relative' to this). === Each time a tile image is changed, we have to do the following now:
where 's_TerrainEffect' is our master custom shader. ===
And avoid doing these steps:
In short, ideally, there we just keep using the CURRENT _terrainShader and _imageSamplingShader, with updated Pixel Data, and updated _shaderUniforms[]. |
You are using |
Thanks! My revised code would look like this:
And you are saying that my 'pBits' Pixel buffer in RAM, goes straight to the GPU one-time, for use in the "img.ToShader()"?? Or are there intermediate copies of the pixel buffers? My assumption is that the above logic would do this:
Preference: (This is how I was used to doing it with OpenGL.) |
Is the "preference" I'm specifying here something that is even feasible with SKIA? (Where the SamplerShader is kept intact, and we just update the PixelBuffer directly - and next time it renders, it uses this new data?) If not, that's OK - -SkiaSharp kicks major butt. We are very very pleased with the performance. And my love and appreciation for you remains undying and unwavering. :) Thank you, thank you, thank you! For getting this resolved for us. We'll be moving to Skia version 3, soon. And will be happy to let you use our application as a Showcase/showpiece for Skia Sharp, if this interests you. |
Description
Thank you for SkiaSharp! We love love love it.
Tonight I started our efforts to port our application from using SkiaSharp 2.88.6 (which is working like a champ, making full use of custom shaders on GLSurfaces), to the latest 3.0.0-preview2.1.
We had to change over about a dozen API's to the new API's -- very smooth, no issue.
But now when we run our previously very-stable application (in production), it now (with SkiaSharp v3.0) instantly crashes with an "Access Violation" error where the stack trace shows a Native SKObject Dispose() method being called by it's finalizer.
Here's what I see in Visual Studio 2022:
The object type Disposed is of type "SKPaint".
Code
Because I don't know which SKObject is being Finalized, I don't know which code triggers it. But somewhere, I've got an SKPaint object that is being disposed via the Finalizer, which is supposed to be fine (right?).
Expected Behavior
I'm expecting SkiaSharp Object Finalizers to NOT cause Access Violation exceptions, but instead to operated normally.
Actual Behavior
Finalizer calls Dispose() which throws a fatal "Access Violoation" exception, and our app crashes.
Version of SkiaSharp
3.x (Alpha)
Last Known Good Version of SkiaSharp
2.88.2 (Previous)
IDE / Editor
Visual Studio (Windows)
Platform / Operating System
Windows
Platform / Operating System Version
Windows 11 Home
Devices
CyberPowerPC with intel i9-14900K, running Windows 11 Home
Relevant Screenshots
Relevant Log Output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: