Validate MiniLcm types #1344

rmunn · 2025-01-06T21:22:57Z

Fixes #1275.
Fixes #1276.
Fixes #1277.
Fixes #1278.
Fixes #1279.
Fixes #1280.

Work completed:

Entry validator now validates all fields
Sense validator now validates all fields
Example sentence validator now validates all fields
Writing system validator created
Semantic domain validator created
Part of speech validator created
Tests for entry validator
Tests for sense validator
Tests for example sentence validator
Tests for writing system validator
Tests for semantic domain validator
Tests for part of speech validator
Adjusted some existing unit tests to now create valid data

Contains validators for: - Entry - Sense - Example Sentence - Part of Speech - Semantic Domain

Entry validation now includes "lexeme must not be empty", so we add a non-empty lexeme to the existing entry validation tests.

These tests are quite similar to each other; a test helper method is probably needed here.

Refactored citation form tests to be more generic so they can be resued for other similar fields.

rmunn · 2025-01-06T21:29:19Z

My thoughts on SemDom.xml and GOLDEtic.xml - we load these XML files as resources into the app (TODO: determine which DLL the resources should live in) and create a singleton service (or just a static class) that parses those XML files at system startup and provides an IDictionary interface for looking up POS / semdom data. (Or a hash set if all we need is to validate GUIDs).

Then validation code can say "Is this a predefined / canonical item?" And if it's supposed to be canonical, ensure the GUID is correct. And optionally, verify that the name and description of the canonical items hasn't been modified.

jasonleenaylor · 2025-01-06T23:38:51Z

GUIDs are all that needs to be used to identify if a POS or SemDom is predefined. Unless you want to support versioning, which you probably don't.

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs

backend/FwLite/MiniLcm/Validators/EntryValidator.cs

hahn-kev · 2025-01-07T04:42:58Z

backend/FwLite/MiniLcm/Validators/EntryValidator.cs

+
+    private bool NotBeComponentSelfReference(Entry entry, ComplexFormComponent component)
+    {
+        return component.ComponentEntryId != entry.Id;


this should not be Guid.Empty otherwise it won't make sense

I tried adding a .When() condition to this check, but FluentValidation's .When takes a Func<Entry, bool> and there isn't a WhenForEach. So I'd either have to use a .ForEach with a .When, or I could just make these predicates pass when the GUID in question is empty. I chose the latter approach in commit 87b2f8b.

looks like this may have gotten mixed up with the HaveCorrectComponentEntryReference change I missed it on my first pass too. This check needs to prevent the Id from being empty, as it can't be inferred from the parent. It needs to be not the parent and and not empty, otherwise we don't know what you're referencing.

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs

These tests run a Send/Receive as part of the test and are too slow to be considered unit tests.

Test that circular references are detected

rmunn · 2025-01-07T20:12:18Z

Commit 3bcd237, which actually turns on validation, makes lots of unit tests fail, because their test data is now considered invalid. I'll check through and see if my validation rules are too strict or if test data needs to be updated. I expect it to be a little of both.

As long as senses only have a PartOfSpeechId in them, it's hard to check that property statelessly because we need to look up the PartOfSpeech in order to determine whether it's predefined (and thus whether its GUID needs to match one of the canonical GUIDs). For now, we'll skip checking part of speech GUIDs until senses have an actual PartOfSpeech reference.

Now, instead of semantic domains always being considered predefined when they come from fwdata, we can now look up their GUIDs in the canonical list and set Predefined correctly. This also makes two failing tests pass.

The Sena3SyncTests are failing because some parts of speech are being created with non-canonical GUIDs. Let's comment this out for now to make the tests pass, then uncomment it once we've investigated where the non-canonical PoS GUIDs are coming from.

rmunn · 2025-01-07T20:59:59Z

Many tests are now failing with the error "Fieldname 'HumanNoOpinionNumber' does not exist." I don't get those failures when running tests locally, and I have no idea where that "HumanNoOpinionNumber" name is coming from. @hahn-kev - Any ideas on this one?

rmunn · 2025-01-07T22:43:35Z

#1350 will be useful here; I had to comment out some of the GUID validation for parts of speech because when all you have is the GUID (in Sense.PartOfSpeechId), it's actually impossible to verify it statelessly. You need to get the PartOfSpeech object in order to see if its Predefined property is true, and only check canonical GUIDs for predefined parts of speech. But FluentAssertion wants validation to be stateless, so that's impossible.

But once #1350 is merged, it will become possible to validate parts of speech as objects, and the PartOfSpeechId validation will just need to be "Should be the same GUID as the PartOfSpeech". The PartOfSpeech validation can then, statelessly, check the Predefined property and look up the GUID in the canonical list only if needed.

rmunn · 2025-01-08T02:27:24Z

Change decided in meeting with Kevin: example sentences are allowed to have an empty .Sentence property.

A Sentence property that has no content at all should be allowed. (Still should not have any empty writing systems in a MultiString, of course).

hahn-kev · 2025-01-08T02:46:58Z

backend/FwLite/MiniLcm/Validators/CanonicalGuidsPartOfSpeech.cs

+{
+    // GUID list taken from src/SIL.LCModel/Templates/GOLDEtic.xml in liblcm
+    // TODO: Consider loading GOLDEtic.xml into app as a resource and add singleton providing access to it, then look up GUIDs there rather than using this hardcoded list
+    public static HashSet<Guid> CanonicalPosGuids = [


this should be readonly, right now the field is mutable, and the HashSet is mutable. Obviously make the field readonly, instead I'd use a FrozenSet<T>

hahn-kev · 2025-01-08T03:07:31Z

backend/FwLite/MiniLcm/Validators/CanonicalGuidsSemanticDomain.cs

+    // GUID list taken from src/SIL.LCModel/Templates/SemDom.xml in liblcm
+    // TODO: Consider loading SemDom.xml into app as a resource and add singleton providing access to it, then look up GUIDs there rather than using this hardcoded list
+    public static HashSet<Guid> CanonicalSemDomGuids = [
+        new Guid("63403699-07C1-43F3-A47C-069D6E4316E5"),


I don't want to spend a bunch of time on it if it's not easy, but looking at this code, at runtime it's going to have to parse all 1,000 strings. An alternative would be to use the constructor which takes a ReadOnlySpan<byte>. This would require creating the guids via parsing then calling guid.ToByteArray() and converting that into source like this:

new Guid([128, 117, 208, 48, 82, 80, 145, 77, 188, 36, 70, 155, 139, 45, 125, 249]), //30d07580-5052-4d91-bc24-469b8b2d7df9

again if that turns out to be too hard don't worry about it, we can always improve it later

Will do later. It's probably easy to automate this with a quick script, so I'll try that.

hahn-kev · 2025-01-08T03:30:10Z

backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs

@@ -162,6 +162,7 @@

    public Task<WritingSystem> CreateWritingSystem(WritingSystemType type, WritingSystem writingSystem)
    {
+        validators.ValidateAndThrow(writingSystem);


there's a number of these warnings, I see you've fixed a few of them but there's still some more

Done. This makes a change I made, splitting ValidateAndThrowAsync out from ValidateAndThrow, irrelevant, and I'll move the names back to ValidateAndThrow everywhere. (I did this precisely because CreateWritingSystem was synchronous, and I thought it should therefore use sync validation — and also because of the "You should not use asynchronous rules when using automatic validation with ASP.NET as ASP.NET’s validation pipeline is not asynchronous" warning in the FluentValidation docs).

hahn-kev · 2025-01-08T03:32:00Z

backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs

@@ -301,7 +305,7 @@
            Id = semanticDomain.Guid,
            Name = FromLcmMultiString(semanticDomain.Name),
            Code = semanticDomain.Abbreviation.UiString ?? "",
-            Predefined = true, // TODO: Look up in a GUID list of predefined data
+            Predefined = CanonicalGuidsSemanticDomain.CanonicalSemDomGuids.Contains(semanticDomain.Guid),


there should be similar code for PartOfSpeech, we should do the same thing there

hahn-kev · 2025-01-08T03:35:48Z

backend/FwLite/FwLiteProjectSync.Tests/Sena3SyncTests.cs

@@ -75,6 +75,7 @@ private async Task WorkaroundMissingWritingSystems()
    }

    [Fact]
+    [Trait("Category", "Integration")]


rather than adding this trait to each test you could just add it to the class

Did not realize that was a possibility. Done.

hahn-kev · 2025-01-08T03:38:10Z

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs

+        var entryId = Guid.NewGuid();
+        var entry = new Entry() { Id = entryId, LexemeForm = new MultiString(){{"en", "lexeme"}}, ComplexForms = [new ComplexFormComponent(){ ComplexFormEntryId = entryId, ComponentEntryId = Guid.Empty }] };
+        _validator.TestValidate(entry).ShouldHaveValidationErrorFor("ComplexForms[0]");
+        // _validator.TestValidate(entry).ShouldNotHaveAnyValidationErrors();


you can probably remove this commented code

hahn-kev · 2025-01-08T03:58:16Z

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs

+    public void Succeeds_WhenComponentsContainEmptyGuid()
+    {
+        var entryId = Guid.NewGuid();
+        var entry = new Entry() { Id = entryId, LexemeForm = new MultiString(){{"en", "lexeme"}}, Components = [new ComplexFormComponent(){ ComplexFormEntryId = entryId, ComponentEntryId = Guid.Empty }] };


I made another comment about this elsewhere, but the ComponentEntryId must be defined here, otherwise we don't know what's being referenced, the ComplexFormEntryId can be empty because it can be inferred from the parent.

This is the second time that I've struggled to understand something you said about complex form components, because my intuition isn't tracking with what you're saying. I may be misunderstanding what ComponentEntryId and ComplexFormEntryId mean in a ComplexFormComponent object. Here's what I think it means:

Entry "hatstand", ID 1. Is a complex form.
Entry "hat", ID 2. Is a component of ID 1.
Entry "stand", ID 3. Is a component of ID 1.

"hatstand" will have a Components property with two ComplexFormComponent objects in it:

ComplexFormEntryId = 1, ComponentEntryId = 2

ComplexFormEntryId = 1, ComponentEntryId = 3

"hat" will have a ComplexForm property with one ComplexFormComponent object in it:

ComplexFormEntryId = 1, ComponentEntryId = 2

"stand" will have a ComplexForm property with one ComplexFormComponent object in it:

ComplexFormEntryId = 1, ComponentEntryId = 3

Is that correct? Or do I have it backwards somehow?

Yes that's all correct.

It is really easy to mess up when looking at the code, especially because each entry can have Components and ComplexForms. I've messed stuff up a number of times too. If you have a suggestion of how to improve it I'm open to suggestions.

hahn-kev · 2025-01-08T03:59:07Z

backend/FwLite/MiniLcm.Tests/Validators/ExampleSentenceValidatorTests.cs

+
+    [Theory]
+    [InlineData("Sentence")]
+    [InlineData("Translation")]


should use the nameof syntax here as well

hahn-kev · 2025-01-08T04:02:14Z

backend/FwLite/MiniLcm/Validators/MiniLcmValidators.cs

+        await SemanticDomainValidator.ValidateAndThrowAsync(value);
+    }
+
+    public void ValidateAndThrow(ComplexFormType value)


I'd like to not expose these if we can avoid it, when using the non async version you can't run any async validation rules. We don't have any now but I don't want to block us from using them in the future.

Removed the sync versions as they were no longer called once I fixed the sync blocking warnings from #1344 (review)

hahn-kev · 2025-01-08T04:03:38Z

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs

+
+namespace MiniLcm.Validators;
+
+public class PartOfSpeechIdValidator : AbstractValidator<Guid?>


if we do end up keeping this then I might change the name to IsCanonicalPartOfSpeechIdValidator as it doesn't just validate any Id, it's validating that's it's canonical.

Will do later.

Done in commit 0a39681. We might remove it later but the rename is simple so it's worth doing now even if we do remove it later.

hahn-kev

left some feedback, it looks good so far.

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

rmunn · 2025-01-08T04:32:19Z

@hahn-kev wrote:

left some feedback, it looks good so far.

Addressed most review comments in commit c3f0908. The GUID constructor that doesn't parse strings is one I'll tackle tomorrow morning. Everything else is either done or waiting until later (such as renaming PartOfSpeechIdValidator, which I'm waiting on because I think I'll end up removing it) except for #1344 (review). There, I need a bit of help because I think I might be misunderstanding what the properties of ComplexFormComponent mean. (Either that, or I'm misunderstanding what you mean in that comment and I need you to expand on it a little).

It looks like that issue with HumanNoOpinionNumber is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.

Yeah, that might have been a one-off caused by something entirely different. Commit 922f0a0, which only changed the validation rule for example sentences, also made that error go away. So I'll chalk it up to weirdness and ignore it unless it comes back.

rmunn added 5 commits January 6, 2025 13:44

Add validation for entries and most entry fields

01a619e

Contains validators for: - Entry - Sense - Example Sentence - Part of Speech - Semantic Domain

Update entry validation tests to pass again

8499ff5

Entry validation now includes "lexeme must not be empty", so we add a non-empty lexeme to the existing entry validation tests.

Add entry validator tests for lexeme, citation form

dcf2aef

These tests are quite similar to each other; a test helper method is probably needed here.

Add entry validation tests for literal meaning, note

4dd4032

Refactored citation form tests to be more generic so they can be resued for other similar fields.

Add validation tests for senses, example sentences

04b8b26

rmunn self-assigned this Jan 6, 2025

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm.Tests/Validators/EntryValidatorTests.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm/Validators/EntryValidator.cs Outdated Show resolved Hide resolved

hahn-kev reviewed Jan 7, 2025

View reviewed changes

backend/FwLite/MiniLcm/Validators/PartOfSpeechIdValidator.cs Outdated Show resolved Hide resolved

rmunn added 7 commits January 7, 2025 11:31

Address review comments so far

87b2f8b

Add list of canonical GUIDs for parts of speech

7577e83

Add list of canonical GUIDs for semantic domains

98fd92b

Mark FwLiteProjectSync tests as integration tests

68fe485

These tests run a Send/Receive as part of the test and are too slow to be considered unit tests.

Add more entry validation tests

b8e6b30

Test that circular references are detected

Also validate complex form types on updates

379d4d4

Actually use validators in MiniLCM API

3bcd237

rmunn force-pushed the feat/validate-minilcm-types branch from 9b3d43e to 3bcd237 Compare January 7, 2025 20:11

rmunn added 6 commits January 7, 2025 15:19

Adjust some test data to make it valid

1cedbec

Fix test failures around semantic domain IDs

e5b02b7

Now, instead of semantic domains always being considered predefined when they come from fwdata, we can now look up their GUIDs in the canonical list and set Predefined correctly. This also makes two failing tests pass.

Push two missing files

d046bdc

Make EntryReadyForCreation create valid data

8d5a2fe

rmunn marked this pull request as ready for review January 7, 2025 21:01

rmunn requested a review from hahn-kev January 7, 2025 21:01

rmunn mentioned this pull request Jan 7, 2025

refactor Sense.PartOfSpeech to use an object #1232

Open

Better comment

9082936

Example sentences may have empty Sentence fields

922f0a0

A Sentence property that has no content at all should be allowed. (Still should not have any empty writing systems in a MultiString, of course).

hahn-kev reviewed Jan 8, 2025

View reviewed changes

hahn-kev requested changes Jan 8, 2025

View reviewed changes

Address most review comments

c3f0908

Rename PartOfSpeechIdValidator

0a39681

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate MiniLcm types #1344

Validate MiniLcm types #1344

rmunn commented Jan 6, 2025 •

edited

Loading

rmunn commented Jan 6, 2025

jasonleenaylor commented Jan 6, 2025

hahn-kev Jan 7, 2025

rmunn Jan 7, 2025

hahn-kev Jan 8, 2025 •

edited

Loading

rmunn commented Jan 7, 2025

rmunn commented Jan 7, 2025

rmunn commented Jan 7, 2025

rmunn commented Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev Jan 8, 2025

rmunn Jan 8, 2025

rmunn Jan 8, 2025

hahn-kev left a comment

rmunn commented Jan 8, 2025


		namespace MiniLcm.Validators;

		public class PartOfSpeechIdValidator : AbstractValidator<Guid?>

Validate MiniLcm types #1344

Are you sure you want to change the base?

Validate MiniLcm types #1344

Conversation

rmunn commented Jan 6, 2025 • edited Loading

rmunn commented Jan 6, 2025

jasonleenaylor commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hahn-kev Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

rmunn commented Jan 7, 2025

rmunn commented Jan 7, 2025

rmunn commented Jan 7, 2025

rmunn commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hahn-kev left a comment

Choose a reason for hiding this comment

rmunn commented Jan 8, 2025

rmunn commented Jan 6, 2025 •

edited

Loading

hahn-kev Jan 8, 2025 •

edited

Loading