Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Data Contract-Based Noun Suggestion with Plural and Gender Handling #300

Conversation

angrezichatterbox
Copy link
Member

Contributor checklist


Description

The PR introduces a Data Contract-based implementation for Noun Suggestion, designed to enhance modularity and flexibility in handling language-specific noun forms. This implementation leverages JSON-defined Data Contracts that are processed using a DataContractModel, located in the app's models module. The model enforces the presence of three essential keys within each contract: numbers, genders, and conjugation. Additionally, it assumes six forms of verbs for each conjugation, initialized as empty maps by default if not explicitly defined.

Plural Suggestion

The plural suggestion logic extracts plural forms from the numbers column in the Data Contract. These plural words are gathered into a list, which is then used to evaluate user input. If the last typed word matches any word in this list, it is identified as plural and marked with the label "PL," regardless of the word's gender.

Noun Gender Suggestion

For handling gender information, the implementation employs the GenderDataManager class, which interacts with a SQLite database. This class operates within the constraints of a specific language and utilizes the associated DataContract to organize and interpret data effectively. The GenderDataManager performs checks to ensure the database file's existence and processes gender data based on the structure:

  1. If a main gender column is present, it processes data directly from this column.
  2. If separate columns for masculine and feminine genders are available, it accommodates this format and processes accordingly.

Key Assumptions

This implementation relies on several foundational assumptions:

  • Every Data Contract must include the numbers, genders, and conjugation keys.
  • Six verb forms must be defined under each conjugation. In cases where these forms are missing, they default to empty maps.

Related issue

Copy link

github-actions bot commented Jan 18, 2025

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Android rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflows within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@angrezichatterbox
Copy link
Member Author

Should I add the license to all the files that I have newly created ?

@andrewtavis
Copy link
Member

We'll have the new license notice, so let's not worry about it for now and it can be fixed in the PR that changes all the files 😊

@andrewtavis
Copy link
Member

This is some amazing work here, @angrezichatterbox! So great to see this taking shape 😊 Will be so helpful moving forward 🥳

Wanted to check on the functionality for nouns that have two genders. I'm seeing that Schild isn't leading to a gender on German keyboards. That's should be a blue M next to a green N given it's forms. The German word See should be a red F next to a blue M as it's lake and ocean and has different genders based on each. I also always check Leben, which should be a green N next to an orange PL. We'd talked not showing PL with anything else, but I think I wasn't clear there. For Spanish this isn't needed as we can just say they're plural and don't need to say that they're also feminine or masculine, but then that's also because that language is giving us a non-standard "feminine plural" rather than "plural for a feminine noun" based on the data. For German Leben is both neutral and is it's own plural, which is important to indicate to the user as if they try to use the plural command they'll be confused by the response if we don't make it clear that it's both.

How do we want to proceed here? Do you want to expand the noun-gender annotations so that we have the split annotations? This is how it looks on iOS, btw:

Simulator Screenshot - iPhone 16 Pro - 2025-01-18 at 19 59 48

Simulator Screenshot - iPhone 16 Pro - 2025-01-18 at 19 59 10

Let me know, and thanks again for the amazing implementation so far!

@angrezichatterbox
Copy link
Member Author

So in German only we have these cases. Like in case of German or any db that has a gender column I could make the Type into a List so that if its already added to a specific gender it would also have other types of the same word to be present as possible noun type for the word.

As for the Plural, How should it be handled. Should I make all Plural word shows their gender as well.

@andrewtavis
Copy link
Member

I'd say that we shouldn't do this. The case for Leben is that it's singular and plural, but if we do that for all plurals then people would think that Hermanas (sisters in Spanish) would also be a singular word. So if it's in the plural column show plural, if it's in a singular column and a plural show both the gender and plural, and if it's just in a singular column or has a canonical gender and isn't in plural then it's just the gender :)

How does this sound? 😊

@angrezichatterbox
Copy link
Member Author

I'd say that we shouldn't do this. The case for Leben is that it's singular and plural, but if we do that for all plurals then people would think that Hermanas (sisters in Spanish) would also be a singular word. So if it's in the plural column show plural, if it's in a singular column and a plural show both the gender and plural, and if it's just in a singular column or has a canonical gender and isn't in plural then it's just the gender :)

How does this sound? 😊

Ok I think I have understood it. I will make the changes and sent in the updated PR by Tuesday or Wednesday.

@angrezichatterbox angrezichatterbox force-pushed the feature/data-contract-based-suggestion branch from d87e566 to 99a598e Compare January 21, 2025 17:05
@angrezichatterbox
Copy link
Member Author

I will squash the commit that got duplicated. I had some issues since my git repository got corrupted. Could I get a review on the current design for the dual suggestion @andrewtavis . I am happy to make any cosmetic changes or functional changes to make this PR perfect. It is not yet ready for merge since I have to make some changes to make it work for English keyboard.

@andrewtavis
Copy link
Member

I brought it down and tried to play around with it on German as that's where a lot of double gender strings are, but noun annotations aren't showing up for the German keyboard at all 🤔

@angrezichatterbox
Copy link
Member Author

angrezichatterbox commented Jan 22, 2025

I brought it down and tried to play around with it on German as that's where a lot of double gender strings are, but noun annotations aren't showing up for the German keyboard at all 🤔

Does it not come up even for words like Schild or Leben

Screenshot 2025-01-22 at 9 25 10 AM

This is what comes for me for Leben now.

@angrezichatterbox angrezichatterbox force-pushed the feature/data-contract-based-suggestion branch from dab898b to a92a2fa Compare January 22, 2025 03:20
@angrezichatterbox
Copy link
Member Author

Just wanted to confirm.

        "1": {
            "title": "",
            "1": "",
            "2": "",
            "3": "",
            "4": "",
            "5": "",
            "6": ""
        }

This individual text would be replaces with a structure similar to

  "1": {
            "title": "Presente",
            "1": { "yo": "indicativePresentFirstPersonSingular" },
            "2": { "tú": "indicativePresentFirstPersonPlural" },
            "3": { "él/ella/Ud.": "indicativePresentSecondPersonSingular" },
            "4": { "nosotros": "indicativePresentSecondPersonPlural" },
            "5": { "vosotros": "indicativePresentThirdPersonSingular" },
            "6": { "ellos/ellas/Uds.": "indicativePresentThirdPersonPlural" }
        },

where we would have Map in the individual keys other than title.

@angrezichatterbox angrezichatterbox force-pushed the feature/data-contract-based-suggestion branch from a92a2fa to 4173a03 Compare January 22, 2025 04:17
@angrezichatterbox angrezichatterbox force-pushed the feature/data-contract-based-suggestion branch from 4173a03 to 75e37e6 Compare January 22, 2025 05:05
@andrewtavis
Copy link
Member

Just wanted to confirm.

        "1": {
            "title": "",
            "1": "",
            "2": "",
            "3": "",
            "4": "",
            "5": "",
            "6": ""
        }

This individual text would be replaces with a structure similar to

  "1": {
            "title": "Presente",
            "1": { "yo": "indicativePresentFirstPersonSingular" },
            "2": { "tú": "indicativePresentFirstPersonPlural" },
            "3": { "él/ella/Ud.": "indicativePresentSecondPersonSingular" },
            "4": { "nosotros": "indicativePresentSecondPersonPlural" },
            "5": { "vosotros": "indicativePresentThirdPersonSingular" },
            "6": { "ellos/ellas/Uds.": "indicativePresentThirdPersonPlural" }
        },

where we would have Map in the individual keys other than title.

Yes exactly, @angrezichatterbox :) I can't really think of a better way of making sure that things are readable for the developer to know what goes where and also make sure that we have the label for the conjugation field.

@andrewtavis
Copy link
Member

As discussed in #161, I invalidated my cache and remade my emulator and all's working better now, @angrezichatterbox 😊 Sorry for the wait here! Glad that it's at least in the docs now :)

Designs are looking really good! ✨ Do you want to add anything else to this PR, or should I do a final check of the code and merge it in? :)

@angrezichatterbox
Copy link
Member Author

As discussed in #161, I invalidated my cache and remade my emulator and all's working better now, @angrezichatterbox 😊 Sorry for the wait here! Glad that it's at least in the docs now :)

Designs are looking really good! ✨ Do you want to add anything else to this PR, or should I do a final check of the code and merge it in? :)

I'll have to make some changes to make it work for English Keyboard. Since it's current empty strings. The contract couldn't be extracted and the keyboard would crash. Could we have it made temporary to empty map with empty string mapping to empty string. Or else I could make it temporary work for empty strings.

@andrewtavis
Copy link
Member

Feel free to add in placeholder strings and we can merge :) The contract won't have empty strings, and maybe if the functionality isn't needed for a language we can do "NOT_INCLUDED" for the value - like for English noun genders.

@andrewtavis
Copy link
Member

See cc48f5f for the most recent contracts, @angrezichatterbox :) I think that this can work. We can just check for "NOT_INCLUDED" and cancel operations if that's the case.

@andrewtavis
Copy link
Member

I guess it'd make sense for you to do any changes needed to get the English keyboard working and then we can merge? Feel free to edit "NOT_INCLUDED" to a different placeholder or a placeholder object or something like that if it makes using the contract easier! 😊

@angrezichatterbox
Copy link
Member Author

I have made a slight change to the contract to be like this.

 "genders":  {
        "canonical": ["NOT_INCLUDED"],
        "feminines": ["NOT_INCLUDED"],
        "masculines": ["NOT_INCLUDED"],
        "commons": ["NOT_INCLUDED"],
        "neuters": ["NOT_INCLUDED"]
    },

This was because I assumed that gender would have 5 columns by default where each would be of type List. I hope it is fine.

@angrezichatterbox
Copy link
Member Author

Would it make sense to make a workflow that would run whenever the data contract are changed so that we would get to know whether we have to make changes to the contract loader ?

@andrewtavis
Copy link
Member

Ideally the loader would never change, but then the chances of that are quite low, so yes some kind of testing would be needed. That's a different issue though for sure.

@andrewtavis
Copy link
Member

With the above changes and us making a new issue for the loader check, we're good for a final check here and merging it in?

@angrezichatterbox
Copy link
Member Author

With the above changes and us making a new issue for the loader check, we're good for a final check here and merging it in?

Yes, It's good for a final check.

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All's looking great here, @angrezichatterbox! Thanks for the first implementation of the data contracts! 🚀 I'll make the issue that we discussed :)

@andrewtavis andrewtavis merged commit eb7240c into scribe-org:main Jan 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants