Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion of existing text to use split words #4

Open
idow09 opened this issue Jan 19, 2021 · 3 comments
Open

Conversion of existing text to use split words #4

idow09 opened this issue Jan 19, 2021 · 3 comments

Comments

@idow09
Copy link

idow09 commented Jan 19, 2021

Hi! Love the project!! 🤩
Are there any plans or thoughts about implementing a mechanism to handle (convert?) existing, old, non-neutral text?
I'm a software engineer with background in ML and I'm curious if this is something you think worth the shot, and if so I would love to learn from your experience and expertise in the field so I don't go wasting my time 😅
Thanks

@avrahamcornfeld
Copy link
Member

avrahamcornfeld commented Jan 26, 2021

Hey Ido,
Thanks for the input. This would be an amazing feature but I don't think it is possible, since the code doesn't understand the context of the words. For example the word פתח can have so many meanings - some of which need to be genderized and others don't. Here are some random examples:

  • אמרתי לאחי ״פתח את הדלת״
  • אורח/ת יקר/ה, הכנס/י דרך פתח הבניין, עלה/י במדרגות ופתח/י את הדלת
  • את האות אל״ף יש לנקד בסימן פתח

The only way this may work is with a very powerful AI script that knows to analyze the text...

@idow09
Copy link
Author

idow09 commented Jan 26, 2021

I realize the challenge of course. If it was an easy task I'm sure you already would have done it. But I think some research should be done on current SOTA Hebrew NLP models before concluding it as not possible don't you think?
Think about it: even if the success rate of such model is not 100% (it never is...), one could utilize such tool to achieve an excellent headstart, and then go fixing the errors manually.
Existing NLP models are very powerful in understanding grammar and extracting meaning from context...
Let me know what you think, thanks!

@kinging123
Copy link
Member

Hi @idow09, your idea sounds pretty cool and challenging.

Currently, Ivrita is being executed on the client-side of the website, so I assume using NLP/machine learning is not relevant to the current implementation.
In my understanding, in order to use the models you're referring to, Ivrita would need to run on a centralized server, to which all the texts will be sent over an API. This means that (to save loading time of each page on the website) all of the strings would need to be parsed once when publishing a page, and not on runtime like they are currently being parsed.

Although this requires a lot of work and would be used slightly differently than the current product we offer, it does sound very exciting, and we know a lot of people who really need it! If you have the passion to make this (or even just begin the work on it) - we will be glad to see and support it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants