Conversion of existing text to use split words #4

idow09 · 2021-01-19T19:29:00Z

Hi! Love the project!! 🤩
Are there any plans or thoughts about implementing a mechanism to handle (convert?) existing, old, non-neutral text?
I'm a software engineer with background in ML and I'm curious if this is something you think worth the shot, and if so I would love to learn from your experience and expertise in the field so I don't go wasting my time 😅
Thanks

avrahamcornfeld · 2021-01-26T07:31:58Z

Hey Ido,
Thanks for the input. This would be an amazing feature but I don't think it is possible, since the code doesn't understand the context of the words. For example the word פתח can have so many meanings - some of which need to be genderized and others don't. Here are some random examples:

אמרתי לאחי ״פתח את הדלת״
אורח/ת יקר/ה, הכנס/י דרך פתח הבניין, עלה/י במדרגות ופתח/י את הדלת
את האות אל״ף יש לנקד בסימן פתח

The only way this may work is with a very powerful AI script that knows to analyze the text...

idow09 · 2021-01-26T08:45:47Z

I realize the challenge of course. If it was an easy task I'm sure you already would have done it. But I think some research should be done on current SOTA Hebrew NLP models before concluding it as not possible don't you think?
Think about it: even if the success rate of such model is not 100% (it never is...), one could utilize such tool to achieve an excellent headstart, and then go fixing the errors manually.
Existing NLP models are very powerful in understanding grammar and extracting meaning from context...
Let me know what you think, thanks!

kinging123 · 2021-01-29T15:59:52Z

Hi @idow09, your idea sounds pretty cool and challenging.

Currently, Ivrita is being executed on the client-side of the website, so I assume using NLP/machine learning is not relevant to the current implementation.
In my understanding, in order to use the models you're referring to, Ivrita would need to run on a centralized server, to which all the texts will be sent over an API. This means that (to save loading time of each page on the website) all of the strings would need to be parsed once when publishing a page, and not on runtime like they are currently being parsed.

Although this requires a lot of work and would be used slightly differently than the current product we offer, it does sound very exciting, and we know a lot of people who really need it! If you have the passion to make this (or even just begin the work on it) - we will be glad to see and support it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion of existing text to use split words #4

Conversion of existing text to use split words #4

idow09 commented Jan 19, 2021

avrahamcornfeld commented Jan 26, 2021 •

edited

Loading

idow09 commented Jan 26, 2021

kinging123 commented Jan 29, 2021

Conversion of existing text to use split words #4

Conversion of existing text to use split words #4

Comments

idow09 commented Jan 19, 2021

avrahamcornfeld commented Jan 26, 2021 • edited Loading

idow09 commented Jan 26, 2021

kinging123 commented Jan 29, 2021

avrahamcornfeld commented Jan 26, 2021 •

edited

Loading