-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added disambiguated text to texts folder along with source text #2
base: master
Are you sure you want to change the base?
Conversation
Formatted the disambiguated text a bit.
Hi @pranavad, git shows these files as binary files and not text files. (See the 'Files changed' tab for example.) How did you create the files? Is there an issue with encoding or something? |
I saved them as Unicode encoding in Notepad. I'll try and export it again in UTF-8 perhaps? |
Yup, do that. |
I think it's fine now. I re-encoded them in UTF-8, and now it isn't showing them as binary files |
Hey, for something like Bay of Bengal/Indian Ocean, is the disambiguation for "Indian"/"Bengal" an adjective, an hydronym or a toponym? |
Similarly, is Ocean/Bay here a common noun or a proper noun? |
For words such as "Soviet Union" or "World Cup" where each word can be interpreted as a common noun, how do I disambiguate it? Languages and sports are "altres" proper nouns? And lastly, things like Hindustani Music, would have "Hindustani" as a proper noun or adjective? |
@shardulc प्रभावित इत्यादि हालाँकि उभरी सोवियत पुरजोर अलावा वाली हिन्दुस्तानी |
@pranavad Sorry for the late response. For all the examples like "Bay of Bengal", "Soviet Union", etc. the short answer is that do it the way the English module analyzes them, since that module is mature now. The long answer, if you want to know how/why things are done, is that I'm not sure about the technicalities and you'll have to ask on the channel. From your list, प्रभावित and हिन्दुस्तानी seem to be normal adjectives to me. The comment above applies to the rest. |
Hey, @shardulc, I completed the disambiguation for all the lemmatized words as well as fixed a few others. |
Is there anything else required for merging/task completion? |
Approved your task! I'll look into merging with perhaps some minor changes shortly. |
Hey @Kainatic. I had trouble understanding this at first as well. I'll help you out on IRC, join the Apertium channel |
On IRC |
@Kainatic Off memory, you have to install lttoolbox first of all and download apertium-hin.hin.dix to your home folder in the Virtual Box after installation [you'll find steps for this in the doc]. Then, run (The hin.analyser.bin is a name that you can change) Then you can echo any text you want to analyse by Hope this helps. |
Thanks |
@pranavad How did you do it? |
The best way to compile apertium-hin is to install apertium, clone apertium-hin, go into the cloned directory, run |
@jonorthwash Went into the cloned directory and ran './configure' but it said 'bash: ./configure: No such file or directory' |
Adding tests to a PDF, will upload on the GCI task page.
Hey, @shardulc. Should I make another PR for the constraint grammar task, or is this fine for now? |
Formatted the disambiguated text a bit.