Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safetensor metadata parsing and display #1074

Merged
merged 4 commits into from
Jan 20, 2025

Conversation

Genteure
Copy link
Contributor

image

@mohnjiles
Copy link
Contributor

Thank you for the submission! Love the idea for sure, just had a couple issues running it locally -

  • Open SM
  • Navigate to Checkpoints page
  • Select Lora folder
  • Right click Lora that I know has safetensor metadata
  • No "View Safetensor Metadata" button appears
  • Click "Refresh" button in top right menu bar
  • Right click same lora
  • View Safetensor Metadata button is now there

Also, the regular ItemsControl & FlexPanel do not support virtualization by default, leading to a delay before opening the dialog while it renders all the tags. In some cases, my models have over 2500 tags, so it took a few seconds. I replaced the ItemsControl with an ItemsRepeater & FlowLayout from FluentAvalonia and it performs much better. For example:

        <ui:ItemsRepeater
            Grid.Row="3"
            Margin="8"
            IsVisible="{Binding Metadata.TagFrequency, Converter={x:Static ObjectConverters.IsNotNull}}"
            ItemsSource="{Binding Metadata.TagFrequency}">
            <ui:ItemsRepeater.ItemTemplate>
                <DataTemplate x:CompileBindings="False">
                    <Button
                        Cursor="Hand">
                        <StackPanel Orientation="Horizontal">
                            <TextBlock FontWeight="SemiBold" Text="{Binding Name}" />
                            <TextBlock Margin="5,0,0,0" Text="{Binding Frequency}" />
                        </StackPanel>
                    </Button>
                </DataTemplate>
            </ui:ItemsRepeater.ItemTemplate>
            <ui:ItemsRepeater.Layout>
                <ui:FlowLayout MinRowSpacing="4" MinColumnSpacing="4" Orientation="Horizontal"/>
            </ui:ItemsRepeater.Layout>
        </ui:ItemsRepeater>

Also if you would prefer to not use x:CompileBindings="False" you may be able to move the Tag struct out from being a child of SafetensorMetadata and put that in the DataType property of the DataTemplate, but I was feeling lazy 😄

@Genteure
Copy link
Contributor Author

Genteure commented Jan 5, 2025

Right, now I think I really should've opened an issue discussing how to implement this UI wise before actually doing it.


No "View Safetensor Metadata" button

I have tried saving the parsed metadata into litedb but the database read/write performance is way way worse (15-20 times slower). Not sure what exactly is causing this, maybe it's just the amount of data is a lot larger?
What do you think is the best way to do this? Do you think searching by metadata trained keywords could be a useful addition in the future (perhaps with a disabled by default toggle)?
If we don't need searching, I'm thinking:

  • Remove SafetensorMetadata from LocalModelFile
  • Change the condition for showing "View Safetensor Metadata" from SafetensorMetadata is not null to RelativePath.EndsWith(".safetensor")
  • Read metadata only when the user clicks "View Safetensor Metadata"
  • Show "No Metadata" if there's no metadata.

ItemsRepeater & FlowLayout from FluentAvalonia

I totally forgot that's a thing, will test and use that instead.


move the Tag struct out

I went from Dictionay<string, int> to List<(string tag, int freq)> to the current version. You can actually reference a nested type in XAML by using + like Name.Space.Type+ChildType. I think the current type structure is fine but if you have any preference on how/where to place each type let me know.

@ionite34
Copy link
Member

ionite34 commented Jan 6, 2025

Hi, first of all thanks for the PR. I've looked over the SafetensorMetadata.cs parsing and unit tests, and it all looks good. I liked the manual json token parsing to avoid needing to load the other json content for the tensor offsets.

I have tried saving the parsed metadata into litedb but the database read/write performance is way way worse (15-20 times slower). Not sure what exactly is causing this, maybe it's just the amount of data is a lot larger?

I believe it's a performance issue with the litedb async interface we're using, the IO delay is quite long for read and writes. We could cache the safetensors metadata in the cm-info.json file perhaps on a new optional property of ConnectedModelInfo. We already read that file for all models anyways and source generated json reading in .NET 9 is quite fast now.

(I'm leaning towards removing any litedb operations for model indexing at some point and just loading to memory on startup, since at least on my end the full indexing combined with cm-info reading is consistently faster than reading from litedb.)

If we don't need searching, I'm thinking:

  • Remove SafetensorMetadata from LocalModelFile
  • Change the condition for showing "View Safetensor Metadata" from SafetensorMetadata is not null to RelativePath.EndsWith(".safetensor")
  • Read metadata only when the user clicks "View Safetensor Metadata"
  • Show "No Metadata" if there's no metadata.

We could probably not load the safetensor data with indexing as well. We could have another IndexMetadata function on ModelIndexService that goes through existing models that don't have ConnectedModelInfo.SafetensorMetadata and do the safetensor read, then save that property along with saving the .cm-info.json file. Then that could run in the background separately after model index finishes.

"View Safetensor Metadata" could first check ConnectedModelInfo.SafetensorMetadata, and if it doesn't exist, do a manual loading of safetensor metadata and saving to ConnectedModelInfo.SafetensorMetadata. So if the background metadata reading finishes first, or the cm-info already has metadata, the user gets the dialog open from memory, otherwise they wait for the actual reading. Searching tags would gradually work on all models as they finish background metadata reading as well.

@Genteure
Copy link
Contributor Author

Genteure commented Jan 8, 2025

Trying to put SafetensorMetadata into ConnectedModelInfo is not working out well.

$ find . -type f -not -path "*/.git/*" | awk -F. '{ext[$NF]++} END {for (e in ext) print e, ext[e]}'
pt 6
json 417
jpeg 416
safetensors 610

Not all models have a ConnectedModelInfo, which are the models that need this feature the most. Simply trying to new ConnectedModelInfo() for these models causes System.NullReferenceException in a lot of places elsewhere.

Having a ConnectedModelInfo also changes the appearances which doesn't look that good IMO
image

Skipping reading safetensor metadata for models without ConnectedModelInfo would make it only work for models downloaded from civitai.

Clicking "Update Existing Metadata" would also remove SafetensorMetadata in ConnectedModelInfo (along with UserTitle and ThumbnailImageUrl) until the user refresh again.

@Genteure Genteure force-pushed the feat/model-metadata branch from f2f471b to 3a02c82 Compare January 12, 2025 08:56
@Genteure
Copy link
Contributor Author

This PR is ready for review.

Copy link
Contributor

@mohnjiles mohnjiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@mohnjiles mohnjiles merged commit e63443a into LykosAI:main Jan 20, 2025
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants