Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All StructId generators are not incremental and extremely inefficient (analyzers too) #60

Open
Sergio0694 opened this issue Dec 24, 2024 · 20 comments · Fixed by #61
Open
Labels
enhancement New feature or request

Comments

@Sergio0694
Copy link

Sergio0694 commented Dec 24, 2024

Overview

Disclaimer: I'm not using this package. Just leaving some feedback here to help the ecosystem 🙂

The incremental generators in this project unfortunately have some (big) issues: they are completely not incremental, and they are extremely inefficient. The latter is because they introduce a whole bunch of generator state tables that will never compare as equal anyway (as they're capturing values that cannot be equated), meaning they'll generate source again every single time.

To clarify:

  • These generators as is are completely incorrect
  • They need to be rewritten to use an attribute as trigger, and use ForAttributeWithMetadataName

If the objection is "but using an attribute is less convenient for users", sure. Maybe. But that's the only way to make this correct and it's a design principle that should not be considered optional for generators. Performance has to come first, because poorly performing generators (like these ones below) impact the entire IDE experience.

You should also enable this property to get the Roslyn analyzer help spot all of these issues (or, most of them):

<EnforceExtendedAnalyzerRules>true</EnforceExtendedAnalyzerRules>

The offending generators:

public virtual void Initialize(IncrementalGeneratorInitializationContext context)
{
var known = context.CompilationProvider
.Select((x, _) => new KnownTypes(x));
// Locate the required type
var types = context.CompilationProvider
.Select((x, _) => x.GetTypeByMetadataName(referenceType));
var ids = context.CompilationProvider
.SelectMany((x, _) => x.Assembly.GetAllTypes().OfType<INamedTypeSymbol>())
.Where(x => x.IsStructId())
.Where(x => x.IsPartial());
var combined = ids.Combine(types)
// NOTE: we never generate for compilations that don't have the specified value interface type
.Where(x => x.Right != null)
.Combine(known)
.Select((x, _) =>
{
var ((structId, referenceType), known) = x;
// The value type is either a generic type argument for IStructId<T>, or the string type
// for the non-generic IStructId
var valueType = structId.AllInterfaces
.First(x => x.Name == "IStructId")
.TypeArguments.OfType<INamedTypeSymbol>().FirstOrDefault() ??
known.String;
return new TemplateArgs(structId, valueType, referenceType!, known);
});
if (referenceCheck == ReferenceCheck.ValueIsType)
combined = combined.Where(x => x.TValue.Is(x.ReferenceType));
combined = OnInitialize(context, combined);
context.RegisterImplementationSourceOutput(combined, GenerateCode);

var customHandlers = context.CompilationProvider
.SelectMany((x, _) => x.Assembly.GetAllTypes().OfType<INamedTypeSymbol>())
.Combine(context.CompilationProvider.Select((x, _) => x.GetTypeByMetadataName("Dapper.SqlMapper+TypeHandler`1")))
.Where(x => x.Left != null && x.Right != null &&
x.Left.Is(x.Right) &&
// Don't emit as plain handlers if they are id templates
!x.Left.GetAttributes().Any(a => a.IsValueTemplate()))
.Select((x, _) => x.Left)
.Collect();
// Non built-in value types can be templatized by using [TValue] templates. These would necessarily be
// file-local types which are not registered as handlers themselves but applied to each struct id TValue in turn.
var templatizedValues = context.SelectTemplatizedValues()
.Where(x => !IsBuiltIn(x.TValue.ToFullName()))
.Combine(context.CompilationProvider.Select((x, _) => x.GetTypeByMetadataName("Dapper.SqlMapper+TypeHandler`1")))
.Where(x => x.Left.Template.TTemplate.Is(x.Right))
.Select((x, _) => x.Left);
// If there are custom type handlers for value types that are in turn used in struct ids, we need to register them
// as handlers that pass-through to the value handler itself.
var customHandled = source
.Combine(customHandlers.Combine(templatizedValues.Collect()))
.Select((x, _) =>
{
(TemplateArgs args, (ImmutableArray<INamedTypeSymbol> handlers, ImmutableArray<TemplatizedTValue> templatized)) = x;
var handlerType = args.ReferenceType.Construct(args.TValue);
var handler = handlers.FirstOrDefault(x => x.Is(handlerType, false));
if (handler == null)
{
var templated = templatized.Where(x => x.TValue.Equals(args.TValue, SymbolEqualityComparer.Default))
.FirstOrDefault();
// Consider templatized handlers that will be emitted as custom handlers too for registration.
if (templated != null)
{
var identifier = templated.Template.Syntax.ApplyValue(templated.TValue)
.DescendantNodes()
.OfType<TypeDeclarationSyntax>()
.First()
.Identifier.Text;
// Use lighter symbol since our template rendering only uses the type name.
handler = new KnownTypeNameSymbol(identifier);
}
}
return args with { ReferenceType = handler! };
})
.Where(x => x.ReferenceType != null);
context.RegisterSourceOutput(builtInHandled.Collect().Combine(customHandled.Collect()).Combine(templatizedValues.Collect()), GenerateHandlers);

var converters = context.CompilationProvider
.SelectMany((x, _) => x.Assembly.GetAllTypes().OfType<INamedTypeSymbol>())
.Combine(context.CompilationProvider.Select((x, _) => x.GetTypeByMetadataName(ValueConverterType)))
.Where(x => x.Left != null && x.Right != null &&
x.Left.Is(x.Right) &&
!x.Left.IsUnboundGenericType &&
x.Left.BaseType?.TypeArguments.Length == 2 &&
// Don't emit as plain converters if they are value templates
!x.Left.GetAttributes().Any(a => a.IsValueTemplate()))
.Select((x, _) => x.Left)
.Collect();
var templatizedValues = context.SelectTemplatizedValues()
.Combine(context.CompilationProvider.Select((x, _) => x.GetTypeByMetadataName(ValueConverterType)))
.Where(x => x.Left.Template.TTemplate.Is(x.Right))
.Select((x, _) => x.Left);
context.RegisterSourceOutput(source.Collect().Combine(converters).Combine(templatizedValues.Collect()), GenerateValueSelector);

var source = context.CompilationProvider
.Select((x, _) => (new KnownTypes(x), x.GetTypeByMetadataName("Newtonsoft.Json.JsonConverter`1")));

public void Initialize(IncrementalGeneratorInitializationContext context)
{
var known = context.CompilationProvider
.Select((x, _) => new KnownTypes(x));
var templates = context.CompilationProvider
.SelectMany((x, _) => x.GetAllTypes(includeReferenced: true).OfType<INamedTypeSymbol>())
.Where(x =>
// Ensure template is a file-local partial record struct
x.TypeKind == TypeKind.Struct && x.IsRecord && x.IsFileLocal &&
// We can only work with templates where we have the actual syntax tree.
x.DeclaringSyntaxReferences.Any(
// And we can locate the TStructIdAttribute type that should be applied to it.
r => r.GetSyntax() is TypeDeclarationSyntax declaration && x.GetAttributes().Any(
a => a.IsStructIdTemplate())))
.Combine(known)
.Select((x, cancellation) =>
{
var (tself, known) = x;
// We infer the idType from the required primary constructor Value parameter type
var tvalue = (INamedTypeSymbol)tself.GetMembers().OfType<IPropertySymbol>().First(p => p.Name == "Value").Type;
var attribute = tself.GetAttributes().First(a => a.IsStructIdTemplate());
// The id type isn't declared in the same file, so we don't do anything fancy with it.
if (tvalue.DeclaringSyntaxReferences.Length == 0)
return new Template(tself, tvalue, attribute, known);
// Otherwise, the idType is a file-local type with a single interface
var type = tvalue.DeclaringSyntaxReferences[0].GetSyntax(cancellation) as TypeDeclarationSyntax;
var iface = type?.BaseList?.Types.FirstOrDefault()?.Type;
if (type == null || iface == null)
return new Template(tself, tvalue, attribute, known) { OriginalTValue = tvalue };
if (x.Right.Compilation.GetSemanticModel(type.SyntaxTree).GetSymbolInfo(iface).Symbol is not INamedTypeSymbol ifaceType)
return new Template(tself, tvalue, attribute, known);
// if the interface is a generic type with a single type argument that is the same as the idType
// make it an unbound generic type. We'll bind it to the actual idType later at template render time.
if (ifaceType.IsGenericType && ifaceType.TypeArguments.Length == 1 && ifaceType.TypeArguments[0].Equals(tvalue, SymbolEqualityComparer.Default))
ifaceType = ifaceType.ConstructUnboundGenericType();
return new Template(tself, ifaceType, attribute, known)
{
OriginalTValue = tvalue
};
})
.Collect();
var ids = context.CompilationProvider
.SelectMany((x, _) => x.Assembly.GetAllTypes().OfType<INamedTypeSymbol>()
.Where(t => !t.IsValueTemplate() && !t.IsStructIdTemplate()))
.Where(x => x.IsRecord && x.IsValueType && x.IsPartial())
.Combine(known)
.Where(x => x.Left.Is(x.Right.IStructId) || x.Left.Is(x.Right.IStructIdT))
.Combine(templates)
.Where(x =>
{
var ((id, known), templates) = x;
var structId = id.AllInterfaces.FirstOrDefault(i => i.Is(known.IStructId) || i.Is(known.IStructIdT));
return structId != null;
})
.SelectMany((x, _) =>
{
var ((id, known), templates) = x;
// Locate the IStructId<TValue> interface implemented by the id
var structId = id.AllInterfaces.First(i => i.Is(known.IStructId) || i.Is(known.IStructIdT));
var tid = structId.IsGenericType ? (INamedTypeSymbol)structId.TypeArguments[0] : known.String;
// If the TValue/Value implements or inherits from the template base type and/or its interfaces
return templates
.Where(template => template.AppliesTo(tid))
.Select(template => new TemplatizedStructId(id, tid, template));
});
context.RegisterSourceOutput(ids, GenerateCode);
}

This model is also completely not incremental:

public record KnownTypes(Compilation Compilation)

Analyzers

Performance issues in this project are not exclusive to generators, but analyzers are also not optimal. However, these should be lower priority than fixing the generators, since at least analyzers don't directly block the IDE (though they should still be fixed).

For instance, here's an offending analyzer:

context.RegisterSyntaxNodeAction(Analyze, SyntaxKind.ClassDeclaration);
context.RegisterSyntaxNodeAction(Analyze, SyntaxKind.StructDeclaration);
context.RegisterSyntaxNodeAction(Analyze, SyntaxKind.RecordDeclaration);
context.RegisterSyntaxNodeAction(Analyze, SyntaxKind.RecordStructDeclaration);
}
static void Analyze(SyntaxNodeAnalysisContext context)
{
var known = new KnownTypes(context.Compilation);
if (context.Node is not TypeDeclarationSyntax typeDeclaration ||
known.IStructIdT is not { } structIdTypeOfT ||
known.IStructId is not { } structIdType)
return;
var symbol = context.SemanticModel.GetDeclaredSymbol(typeDeclaration);
if (symbol is null)
return;

Targeting all syntax nodes and then doing GetDeclaredSymbol on each of them is not efficient. You should use an appropriate callback method, like one for an INamedTypeSymbol, and use that. If you need to also compare against well known types, you should gather them in a compilation start action, and then flow them into a nested symbol callback.

Solution

Like I mentioned above, all these generators need to be rewritten to use an attribute as trigger, and use ForAttributeWithMetadataName. Not wanting to use an attribute for convenience is not a valid argument for having a generator that is outright breaking all design principles of incremental generators, and introducing performance issues that will impact the whole IDE.

If it heps, you can check out some other incremental generators and analyzers for reference, such as:

Back this issue
Back this issue

@Sergio0694 Sergio0694 added the bug Something isn't working label Dec 24, 2024
@Sergio0694 Sergio0694 changed the title All StructId generators are not incremental and extremely efficient (analyzers too) All StructId generators are not incremental and extremely inefficient (analyzers too) Dec 24, 2024
@kzu
Copy link
Member

kzu commented Dec 28, 2024

Thanks for taking a look at this project @Sergio0694! Very much appreciated.

I'll take your analyzer suggestion and see how to refactor that, good point on that front.

On moving to attribute for triggering codegen: having to both implement an interface (or provide a ctor) AND adding an attribute seems detrimental to the user experience (but so would be bad IDE perf). I wouldn't want an API designed to satisfy the compiler at the expense of the API's usability itself, but perhaps a faster analyzer + codefix can make the attribute requirement bearable. I'll investigate further.

Being just a v1, I didn't spend a lot of time polishing perf on the codegen itself, but rather getting to a feature set I could use myself on another project (where perf isn't an issue -yet?- since it's a green field project).

Thanks again for the suggestions and for getting involved. Will keep this issue open and updated (but I'm going on vacation soon-ish so likely it will be some time).

Update

After some experimentation, it seems the advise/recommendation is (at least partially) wrong and shouldn't be considered The True Way. See #60 (comment).

@kzu kzu added enhancement New feature or request and removed bug Something isn't working labels Dec 28, 2024
kzu added a commit that referenced this issue Dec 28, 2024
Switch to symbol action rather than syntax, to speed up analysis.

Partially fixes #60
kzu added a commit that referenced this issue Dec 28, 2024
Switch to symbol action rather than syntax, to speed up analysis.

Partially fixes #60
kzu added a commit that referenced this issue Dec 28, 2024
Switch to symbol action rather than syntax, to speed up analysis.

Partially fixes #60
@kzu kzu closed this as completed in #61 Dec 28, 2024
@kzu kzu closed this as completed in 49ecd9b Dec 28, 2024
@viceroypenguin
Copy link

viceroypenguin commented Dec 28, 2024

@kzu When I see a new greenfield project that replicates what already exists, I always like to ask why it exists. So: why does this exist when Vogen and StronglyTypedId exist?

@viceroypenguin

This comment was marked as resolved.

@viceroypenguin
Copy link

On moving to attribute for triggering codegen: having to both implement an interface (or provide a ctor) AND adding an attribute seems detrimental to the user experience (but so would be bad IDE perf). I wouldn't want an API designed to satisfy the compiler at the expense of the API's usability itself, but perhaps a faster analyzer + codefix can make the attribute requirement bearable. I'll investigate further.

The reality is that the analyzer and codefix performance have absolutely nothing to do with the generator performance. You must address the generator performance issue without any consideration of the performance of analyzers and codefixes.

I wouldn't want an API designed to satisfy the compiler at the expense of the API's usability itself

The reality is that you must satisfy the compiler in this case. It is a requirement. Not doing so will create performance issues for users in IDEs (VS, VS Code, Rider, etc.) on larger codebases.

having to both implement an interface (or provide a ctor) AND adding an attribute

Why do your types need to inherit an interface? Why not just have the attribute and have the generated code add the interface?

@kzu
Copy link
Member

kzu commented Dec 28, 2024

I said "partially fixes" and GH closed it. It wasn't intentional.

image

@kzu kzu reopened this Dec 28, 2024
@kzu
Copy link
Member

kzu commented Dec 28, 2024

Why do your types need to inherit an interface?

That's how you provide generics on what type of thing you are, in .NET. You inherit or implement things. It could be a generic attribute instead nowadays, agreed. But other than that, attributes quickly degenerate in "yaml config via attributes", which I'm not too fond of. i.e. who uses assembly-level attributes for anything?

Contemplate this use case: https://github.com/devlooped/DependencyInjection?tab=readme-ov-file#convention-based

The alternative is doing some weird assembly-level attribute to configure the generator, or worse, adding a .editorconfig. Instead of a simple API in C# itself (which the generator can pick up and act on).

why does this exist when Vogen and StronglyTypedId exist?

Because I'm free, I enjoy doing it and you can happily use whatever else you want. And I enjoy learning new stuff. So, the feedback from this issue by @Sergio0694 took me down to more learning, so it's a net positive for me. You don't learn much by just blindly using existing stuff via nuget just becase "it already exists". I remember RhinoMocks already existed too 😉 .

@viceroypenguin
Copy link

That's how you provide generics on what type of thing you are, in .NET. You inherit or implement things. It could be a generic attribute instead nowadays, agreed. But other than that, attributes quickly degenerate in "yaml config via attributes", which I'm not too fond of. i.e. who uses assembly-level attributes for anything?

First, who is talking about using assembly-level attributes? Sure, configuration makes sense to do via assembly-level attributes, but there's no reason to assume I'm talking about assembly-level attributet here. In your case, I'd highly recommend you do this instead:

[StructId]
public readonly partial record struct UserId

with the interface being added in the generated code as so:

public readonly partial record struct UserId : IStructId<Guid>;

This would be 100x (that's not an exaggeration, that's an actual measured number from the Roslyn team) faster.

Second, you should be aware that the Roslyn team is actively discussing removing support for all uses of context.SyntaxProvider outside of ForAttributeWithMetadataName in a future version of Visual Studio. This would mean that your source generator(s) (both StructId and the usage you pointed out for DependencyInjection) would no longer operate in Visual Studio. The performance issues with not relying on ForAttributeWithMetadataName are too extreme, and the Roslyn team are the ones who receive the complaints.

Because I'm free, I enjoy doing it and you can happily use whatever else you want. And I enjoy learning new stuff. So, the feedback from this issue by @Sergio0694 took me down to more learning, so it's a net positive for me. You don't learn much by just blindly using existing stuff via nuget just becase "it already exists". I remember RhinoMocks already existed too 😉.

Sure, that's a valid reason for another entry into the space. I am just always curious why new entries exist when existing entries exist - usually it's because of some niche that the existing entries do not cover. In many cases, the niche is why I have written several source generator projects this year. I did not see any new purpose for this library, so that's why I was curious. Glad you're learning.

Good luck - I hope you're able to transition to an attribute-based model for StructId.

@kzu
Copy link
Member

kzu commented Dec 28, 2024

@Sergio0694 I'm puzzled by this:

need to be rewritten to use an attribute as trigger, and use ForAttributeWithMetadataName

I disagree. I'm looking at their source and it seems quite possible to write a similarly efficient generator that doesn't need to depend exclusively on attributes: https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/SyntaxValueProvider_ForAttributeWithSimpleName.cs#L55

@kzu
Copy link
Member

kzu commented Dec 28, 2024

I did not see any new purpose for this library

@viceroypenguin my approach to templates is bonkers. Haven't seen anything remotely like it in any of the others. Now I just have to make it performant 😅 .

@viceroypenguin
Copy link

I did not see any new purpose for this library

@viceroypenguin my approach to templates is bonkers. Haven't seen anything remotely like it in any of the others. Now I just have to make it performant 😅 .

You mean using Scriban for generating code? I use it in all of my source generators. 🤷🏼‍♂️

@kzu
Copy link
Member

kzu commented Dec 28, 2024

You haven't even read the readme, have you? 😉 https://github.com/devlooped/StructId#customization-via-templates

@viceroypenguin
Copy link

You haven't even read the readme, have you? 😉 https://github.com/devlooped/StructId#customization-via-templates

I did, I just forgot about that.

Yeah, that is unique. The others rely primarily on configuration rather than these templates. I could definitely see some nuanced needs being addressed by this over the others. Cool trick!

@kzu
Copy link
Member

kzu commented Dec 29, 2024

I plan on generalizing it until I can do away with all the other templates that aren't compiled ones (with the exception of the ones that need to iterate all IDs). Then perhaps create a "MetaSharp" package that only does the compiled template expansion alone and perhaps this package just becomes a bunch of templates (plus a couple scribans) and that's it.

@kzu
Copy link
Member

kzu commented Dec 29, 2024

I've been following @andrewlock testing incremental generators post and I have to say that I find the recomendations and explanations (at least partially) wrong.

This is a run of an cloned compilation to which I add a new syntax tree, where the pipeline uses the compilation provider and selects just the syntax trees (plus a selector/filter): they are perfectly cacheable and incremental OOB:

image

The generator is perfectly incremental:

image

This is the second run with the added syntax tree, note how the first RecordDeclarationSyntax is returned in the outputs as Cached:

image

@Youssef1313
Copy link

Youssef1313 commented Dec 29, 2024

@kzu Your example is too simplified. In practice and most real-world generators, it's extremely difficult to have proper incrementality and good performance, and sometimes it can even be impossible.

You can see how complex FAWMN implementation is to achieve proper incrementality. It's not really trivial even for cases that seem very simple.

@kzu
Copy link
Member

kzu commented Dec 29, 2024

My point is that the blanket advise against using syntaxnode and the compilation provider because that automatically breaks caching and incremental behavior is misguided.

@Youssef1313
Copy link

Use of compilation provider will very often break incrementality, unless you really know what you are doing.

For example, including the compilation itself in the pipeline is a bad idea. Including symbols in the pipeline is a bad idea. Using compilation provider to get the assembly name is okay.

The answer is: it depends. But in general, it's very easy to get things wrong.

The advice to only use FAWMN is actually the advice that Roslyn team usually gives.

@kzu
Copy link
Member

kzu commented Dec 29, 2024

It only takes a bit of time looking at the implementation of FAWMN to understand what to use and how.

There's nothing magical there. And it does use symbol and syntax objects in their own intermediate pipelines. So 🤷🏼‍♂️.

@Youssef1313
Copy link

Youssef1313 commented Dec 30, 2024

@kzu fwiw, the "usage" of symbols as an input to transform is fundamentally very different from having symbols in IncrementalValue[s]Provider. FAWMN does the former, not the latter. The latter is a big no. The former is not always okay.

@kzu
Copy link
Member

kzu commented Dec 30, 2024

Fair enough. That alone goes against the advise to never combine any providers with the CompilationProvider though.

Kindly note how it's perfectly fine as shown in the FAWMN implementation itself: https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/SyntaxValueProvider_ForAttributeWithMetadataName.cs#L89-L133

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants