-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate MiniLcm types #1344
base: develop
Are you sure you want to change the base?
Validate MiniLcm types #1344
Conversation
Contains validators for: - Entry - Sense - Example Sentence - Part of Speech - Semantic Domain
Entry validation now includes "lexeme must not be empty", so we add a non-empty lexeme to the existing entry validation tests.
These tests are quite similar to each other; a test helper method is probably needed here.
Refactored citation form tests to be more generic so they can be resued for other similar fields.
My thoughts on SemDom.xml and GOLDEtic.xml - we load these XML files as resources into the app (TODO: determine which DLL the resources should live in) and create a singleton service (or just a static class) that parses those XML files at system startup and provides an IDictionary interface for looking up POS / semdom data. (Or a hash set if all we need is to validate GUIDs). Then validation code can say "Is this a predefined / canonical item?" And if it's supposed to be canonical, ensure the GUID is correct. And optionally, verify that the name and description of the canonical items hasn't been modified. |
GUIDs are all that needs to be used to identify if a POS or SemDom is predefined. Unless you want to support versioning, which you probably don't. |
|
||
private bool NotBeComponentSelfReference(Entry entry, ComplexFormComponent component) | ||
{ | ||
return component.ComponentEntryId != entry.Id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be Guid.Empty
otherwise it won't make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding a .When()
condition to this check, but FluentValidation's .When
takes a Func<Entry, bool>
and there isn't a WhenForEach
. So I'd either have to use a .ForEach
with a .When
, or I could just make these predicates pass when the GUID in question is empty. I chose the latter approach in commit 87b2f8b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this may have gotten mixed up with the HaveCorrectComponentEntryReference
change I missed it on my first pass too. This check needs to prevent the Id from being empty, as it can't be inferred from the parent. It needs to be not the parent and and not empty, otherwise we don't know what you're referencing.
These tests run a Send/Receive as part of the test and are too slow to be considered unit tests.
Test that circular references are detected
9b3d43e
to
3bcd237
Compare
Commit 3bcd237, which actually turns on validation, makes lots of unit tests fail, because their test data is now considered invalid. I'll check through and see if my validation rules are too strict or if test data needs to be updated. I expect it to be a little of both. |
As long as senses only have a PartOfSpeechId in them, it's hard to check that property statelessly because we need to look up the PartOfSpeech in order to determine whether it's predefined (and thus whether its GUID needs to match one of the canonical GUIDs). For now, we'll skip checking part of speech GUIDs until senses have an actual PartOfSpeech reference.
Now, instead of semantic domains always being considered predefined when they come from fwdata, we can now look up their GUIDs in the canonical list and set Predefined correctly. This also makes two failing tests pass.
The Sena3SyncTests are failing because some parts of speech are being created with non-canonical GUIDs. Let's comment this out for now to make the tests pass, then uncomment it once we've investigated where the non-canonical PoS GUIDs are coming from.
Many tests are now failing with the error "Fieldname 'HumanNoOpinionNumber' does not exist." I don't get those failures when running tests locally, and I have no idea where that "HumanNoOpinionNumber" name is coming from. @hahn-kev - Any ideas on this one? |
#1350 will be useful here; I had to comment out some of the GUID validation for parts of speech because when all you have is the GUID (in Sense.PartOfSpeechId), it's actually impossible to verify it statelessly. You need to get the PartOfSpeech object in order to see if its Predefined property is true, and only check canonical GUIDs for predefined parts of speech. But FluentAssertion wants validation to be stateless, so that's impossible. But once #1350 is merged, it will become possible to validate parts of speech as objects, and the PartOfSpeechId validation will just need to be "Should be the same GUID as the PartOfSpeech". The PartOfSpeech validation can then, statelessly, check the Predefined property and look up the GUID in the canonical list only if needed. |
Change decided in meeting with Kevin: example sentences are allowed to have an empty .Sentence property. |
A Sentence property that has no content at all should be allowed. (Still should not have any empty writing systems in a MultiString, of course).
{ | ||
// GUID list taken from src/SIL.LCModel/Templates/GOLDEtic.xml in liblcm | ||
// TODO: Consider loading GOLDEtic.xml into app as a resource and add singleton providing access to it, then look up GUIDs there rather than using this hardcoded list | ||
public static HashSet<Guid> CanonicalPosGuids = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be readonly, right now the field is mutable, and the HashSet is mutable. Obviously make the field readonly, instead I'd use a FrozenSet<T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// GUID list taken from src/SIL.LCModel/Templates/SemDom.xml in liblcm | ||
// TODO: Consider loading SemDom.xml into app as a resource and add singleton providing access to it, then look up GUIDs there rather than using this hardcoded list | ||
public static HashSet<Guid> CanonicalSemDomGuids = [ | ||
new Guid("63403699-07C1-43F3-A47C-069D6E4316E5"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to spend a bunch of time on it if it's not easy, but looking at this code, at runtime it's going to have to parse all 1,000 strings. An alternative would be to use the constructor which takes a ReadOnlySpan<byte>
. This would require creating the guids via parsing then calling guid.ToByteArray()
and converting that into source like this:
new Guid([128, 117, 208, 48, 82, 80, 145, 77, 188, 36, 70, 155, 139, 45, 125, 249]), //30d07580-5052-4d91-bc24-469b8b2d7df9
again if that turns out to be too hard don't worry about it, we can always improve it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do later. It's probably easy to automate this with a quick script, so I'll try that.
@@ -162,6 +162,7 @@ | |||
|
|||
public Task<WritingSystem> CreateWritingSystem(WritingSystemType type, WritingSystem writingSystem) | |||
{ | |||
validators.ValidateAndThrow(writingSystem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a number of these warnings, I see you've fixed a few of them but there's still some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This makes a change I made, splitting ValidateAndThrowAsync out from ValidateAndThrow, irrelevant, and I'll move the names back to ValidateAndThrow everywhere. (I did this precisely because CreateWritingSystem was synchronous, and I thought it should therefore use sync validation — and also because of the "You should not use asynchronous rules when using automatic validation with ASP.NET as ASP.NET’s validation pipeline is not asynchronous" warning in the FluentValidation docs).
@@ -301,7 +305,7 @@ | |||
Id = semanticDomain.Guid, | |||
Name = FromLcmMultiString(semanticDomain.Name), | |||
Code = semanticDomain.Abbreviation.UiString ?? "", | |||
Predefined = true, // TODO: Look up in a GUID list of predefined data | |||
Predefined = CanonicalGuidsSemanticDomain.CanonicalSemDomGuids.Contains(semanticDomain.Guid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be similar code for PartOfSpeech, we should do the same thing there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -75,6 +75,7 @@ private async Task WorkaroundMissingWritingSystems() | |||
} | |||
|
|||
[Fact] | |||
[Trait("Category", "Integration")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than adding this trait to each test you could just add it to the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not realize that was a possibility. Done.
var entryId = Guid.NewGuid(); | ||
var entry = new Entry() { Id = entryId, LexemeForm = new MultiString(){{"en", "lexeme"}}, ComplexForms = [new ComplexFormComponent(){ ComplexFormEntryId = entryId, ComponentEntryId = Guid.Empty }] }; | ||
_validator.TestValidate(entry).ShouldHaveValidationErrorFor("ComplexForms[0]"); | ||
// _validator.TestValidate(entry).ShouldNotHaveAnyValidationErrors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably remove this commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public void Succeeds_WhenComponentsContainEmptyGuid() | ||
{ | ||
var entryId = Guid.NewGuid(); | ||
var entry = new Entry() { Id = entryId, LexemeForm = new MultiString(){{"en", "lexeme"}}, Components = [new ComplexFormComponent(){ ComplexFormEntryId = entryId, ComponentEntryId = Guid.Empty }] }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made another comment about this elsewhere, but the ComponentEntryId must be defined here, otherwise we don't know what's being referenced, the ComplexFormEntryId can be empty because it can be inferred from the parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second time that I've struggled to understand something you said about complex form components, because my intuition isn't tracking with what you're saying. I may be misunderstanding what ComponentEntryId and ComplexFormEntryId mean in a ComplexFormComponent object. Here's what I think it means:
Entry "hatstand", ID 1. Is a complex form.
Entry "hat", ID 2. Is a component of ID 1.
Entry "stand", ID 3. Is a component of ID 1.
"hatstand" will have a Components property with two ComplexFormComponent objects in it:
- ComplexFormEntryId = 1, ComponentEntryId = 2
- ComplexFormEntryId = 1, ComponentEntryId = 3
"hat" will have a ComplexForm property with one ComplexFormComponent object in it:
- ComplexFormEntryId = 1, ComponentEntryId = 2
"stand" will have a ComplexForm property with one ComplexFormComponent object in it:
- ComplexFormEntryId = 1, ComponentEntryId = 3
Is that correct? Or do I have it backwards somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's all correct.
It is really easy to mess up when looking at the code, especially because each entry can have Components and ComplexForms. I've messed stuff up a number of times too. If you have a suggestion of how to improve it I'm open to suggestions.
|
||
[Theory] | ||
[InlineData("Sentence")] | ||
[InlineData("Translation")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use the nameof
syntax here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
await SemanticDomainValidator.ValidateAndThrowAsync(value); | ||
} | ||
|
||
public void ValidateAndThrow(ComplexFormType value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to not expose these if we can avoid it, when using the non async version you can't run any async validation rules. We don't have any now but I don't want to block us from using them in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the sync versions as they were no longer called once I fixed the sync blocking warnings from #1344 (review)
|
||
namespace MiniLcm.Validators; | ||
|
||
public class PartOfSpeechIdValidator : AbstractValidator<Guid?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do end up keeping this then I might change the name to IsCanonicalPartOfSpeechIdValidator
as it doesn't just validate any Id, it's validating that's it's canonical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 0a39681. We might remove it later but the rename is simple so it's worth doing now even if we do remove it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some feedback, it looks good so far.
It looks like that issue with HumanNoOpinionNumber
is not consistent. Let me know if you see it again, I've sent a slack message to the flex team about it.
@hahn-kev wrote:
Addressed most review comments in commit c3f0908. The GUID constructor that doesn't parse strings is one I'll tackle tomorrow morning. Everything else is either done or waiting until later (such as renaming PartOfSpeechIdValidator, which I'm waiting on because I think I'll end up removing it) except for #1344 (review). There, I need a bit of help because I think I might be misunderstanding what the properties of ComplexFormComponent mean. (Either that, or I'm misunderstanding what you mean in that comment and I need you to expand on it a little).
Yeah, that might have been a one-off caused by something entirely different. Commit 922f0a0, which only changed the validation rule for example sentences, also made that error go away. So I'll chalk it up to weirdness and ignore it unless it comes back. |
Fixes #1275.
Fixes #1276.
Fixes #1277.
Fixes #1278.
Fixes #1279.
Fixes #1280.
Work completed: