Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: keyboard search includes sil_euro_latin when searching for "han script" #3221

Open
darcywong00 opened this issue Dec 2, 2024 · 5 comments
Labels

Comments

@darcywong00
Copy link
Contributor

I was testing localizing keyboard searches and didn't understand why sil_euro_latin appeared in the list when searching for s:han (han script)

Repros on the live site:
https://keyman.com/keyboards?q=s%3Ahan

Image

I'm not finding "han script" in the api.keyman.com blob
https://api.keyman.com/keyboard/sil_euro_latin

@darcywong00 darcywong00 added the bug label Dec 2, 2024
@mcdurdin
Copy link
Member

mcdurdin commented Dec 2, 2024

This happens because sil_euro_latin includes zhd.

See the api search result -- the match object for sil_euro_latin:

 "match": {
        "name": "Han",
        "type": "script",
        "weight": 10,
        "downloads": 1041,
        "totalDownloads": 47470,
        "finalWeight": 79.4889722231331,
        "tag": "zhd"
      }

And looking at sil_euro_latin.kps:

        <Language ID="zhd">Dai Zhuang</Language>

So, this is a bug in sil_euro_latin -- because langtags.json shows that the default script for zhd is in fact Hani (Han script):

    {
        "full": "zhd-Hani-CN",
        "iana": [ "Dai Zhuang" ],
        "iso639_3": "zhd",
        "macrolang": "za",
        "name": "Zhuang, Dai",
        "names": [ "Bu Dai", "Dai Zhuang", "Kau Ndae", "Khaau Daai", "Thu Lao", "Tu", "Tuliao", "Tuzu", "Wen-Ma Southern Zhuang", "Zhuangyu Nanbu Fangyan Wen-Ma Tuyu", "Zhuangyu Nanbu fangyan Dejing tuyu" ],
        "region": "CN",
        "regionname": "China",
        "regions": [ "VN" ],
        "script": "Hani",
        "sldr": false,
        "tag": "zhd",
        "tags": [ "zhd-CN", "zhd-Hani" ],
        "windows": "zhd-Hani"
    },
    {
        "full": "zhd-Latn-VN",
        "iana": [ "Dai Zhuang" ],
        "iso639_3": "zhd",
        "macrolang": "za",
        "name": "Zhuang, Dai",
        "names": [ "Bu Dai", "Dai Zhuang", "Kau Ndae", "Khaau Daai", "Thu Lao", "Tu", "Tuliao", "Tuzu", "Wen-Ma Southern Zhuang", "Zhuangyu Nanbu Fangyan Wen-Ma Tuyu", "Zhuangyu Nanbu fangyan Dejing tuyu" ],
        "obsolete": true,
        "region": "VN",
        "regionname": "Viet Nam",
        "regions": [ "CN" ],
        "script": "Latn",
        "sldr": false,
        "tag": "zhd-Latn",
        "tags": [ "zhd-VN" ],
        "windows": "zhd-Latn"
    },

@mcdurdin mcdurdin transferred this issue from keymanapp/api.keyman.com Dec 2, 2024
@mcdurdin
Copy link
Member

mcdurdin commented Dec 2, 2024

Note: I haven't verified the other tags. It'd be good for a package build to verify the script tags are the same for all referenced BCP 47 languages (and hint on mismatches)

@mcdurdin
Copy link
Member

mcdurdin commented Dec 2, 2024

After implementing a test in kmc-keyboard-info against keymanapp/keyman#12752, I got the following report:

$ ./build.sh | grep KM09011
aramaic_hebrew.kpj - hint KM09011: The script 'Syrj' associated with language tag 'amw-Syrj' does not match the script 'Syrc' for the first language in the package.
arbore.kpj - hint KM09011: The script 'Latn' associated with language tag 'amf' does not match the script 'Zyyy' for the first language in the package.
batak.kpj - hint KM09011: The script '<param>' associated with language tag 'btk-Batk' does not match the script 'Batk' for the first language in the package.
baybayin.kpj - hint KM09011: The script 'Latn' associated with language tag 'tbw-Tglg' does not match the script 'Tglg' for the first language in the package.
baybayin.kpj - hint KM09011: The script 'Latn' associated with language tag 'hnn-Tglg' does not match the script 'Tglg' for the first language in the package.
baybayin.kpj - hint KM09011: The script 'Latn' associated with language tag 'bku-Tglg' does not match the script 'Tglg' for the first language in the package.
bu_phonetic.kpj - hint KM09011: The script '<param>' associated with language tag 'und-Latn' does not match the script 'Latn' for the first language in the package.
basic_kbdinuk2.kpj - hint KM09011: The script 'Cans' associated with language tag 'iu' does not match the script 'Latn' for the first language in the package.
basic_kbdiulat.kpj - hint KM09011: The script 'Cans' associated with language tag 'iu' does not match the script 'Latn' for the first language in the package.
basic_kbdlisub.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'lbc-Lisu' does not match the script 'Lisu' for the first language in the package.
basic_kbdlisus.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'lbc-Lisu' does not match the script 'Lisu' for the first language in the package.
basic_kbdtifi2.kpj - hint KM09011: The script 'Latn' associated with language tag 'shi-Latn' does not match the script 'Tfng' for the first language in the package.
deseret.kpj - hint KM09011: The script 'Latn' associated with language tag 'hop-Dsrt' does not match the script 'Dsrt' for the first language in the package.
gandhari.kpj - hint KM09011: The script 'Khar' associated with language tag 'pgd-Latn' does not match the script 'Deva' for the first language in the package.
geezbrhan.kpj - hint KM09011: The script 'Latn' associated with language tag 'suq' does not match the script 'Ethi' for the first language in the package.
geezbrhan.kpj - hint KM09011: The script 'Latn' associated with language tag 'guk' does not match the script 'Ethi' for the first language in the package.
geezbrhan.kpj - hint KM09011: The script 'Latn' associated with language tag 'dwr' does not match the script 'Ethi' for the first language in the package.
gff_ethiopic.kpj - hint KM09011: The script 'Latn' associated with language tag 'amf' does not match the script 'Ethi' for the first language in the package.
gff_ethiopic.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc' does not match the script 'Ethi' for the first language in the package.
gff_ethiopic.kpj - hint KM09011: The script 'Latn' associated with language tag 'sid' does not match the script 'Ethi' for the first language in the package.
gff_ethiopic.kpj - hint KM09011: The script 'Latn' associated with language tag 'jnj' does not match the script 'Ethi' for the first language in the package.
gff_harege_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'amf' does not match the script 'Ethi' for the first language in the package.
gff_harege_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc' does not match the script 'Ethi' for the first language in the package.
gff_harege_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'sid' does not match the script 'Ethi' for the first language in the package.
gff_harege_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'jnj' does not match the script 'Ethi' for the first language in the package.
gff_mesobe_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'amf' does not match the script 'Ethi' for the first language in the package.
gff_mesobe_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc' does not match the script 'Ethi' for the first language in the package.
gff_mesobe_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'sid' does not match the script 'Ethi' for the first language in the package.
gff_mesobe_fidelat.kpj - hint KM09011: The script 'Latn' associated with language tag 'jnj' does not match the script 'Ethi' for the first language in the package.
idc_deseret.kpj - hint KM09011: The script 'Latn' associated with language tag 'hop-Dsrt' does not match the script 'Dsrt' for the first language in the package.
indonesian_suku.kpj - hint KM09011: The script 'Batk' associated with language tag 'btd-Latn' does not match the script 'Latn' for the first language in the package.
indonesian_suku.kpj - hint KM09011: The script 'Arab' associated with language tag 'mfa-Latn' does not match the script 'Latn' for the first language in the package.        
itrans_odia.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxv' does not match the script 'Orya' for the first language in the package.
itrans_roman.kpj - hint KM09011: The script 'Latn' associated with language tag 'hi-Latn' does not match the script 'Deva' for the first language in the package.
itrans_roman.kpj - hint KM09011: The script 'Taml' associated with language tag 'ta-Latn' does not match the script 'Deva' for the first language in the package.
jawa.kpj - hint KM09011: The script 'Java' associated with language tag 'jv-Java' does not match the script 'Latn' for the first language in the package.
jawa.kpj - hint KM09011: The script 'Java' associated with language tag 'kaw-Java' does not match the script 'Latn' for the first language in the package.
jawa.kpj - hint KM09011: The script 'Java' associated with language tag 'su-Java' does not match the script 'Latn' for the first language in the package.
jawa.kpj - hint KM09011: The script 'Java' associated with language tag 'osi-Java' does not match the script 'Latn' for the first language in the package.
jawa.kpj - hint KM09011: The script 'Java' associated with language tag 'tes-Java' does not match the script 'Latn' for the first language in the package.
kawi_inscript.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'omy-Kawi' does not match the script 'Kawi' for the first language in the package.
kawi_inscript.kpj - hint KM09011: The script 'Sund' associated with language tag 'osn-Kawi' does not match the script 'Kawi' for the first language in the package.
kreative_superlatin.kpj - hint KM09011: The script 'Cyrl' associated with language tag 'sr' does not match the script 'Latn' for the first language in the package.
kreative_supersymbol.kpj - hint KM09011: The script 'Zsym' associated with language tag 'zxx-Zsym' does not match the script 'Zmth' for the first language in the package.
mahajani_inscript.kpj - hint KM09011: The script 'Arab' associated with language tag 'lah-Mahj' does not match the script 'Mahj' for the first language in the package.
nailangs.kpj - hint KM09011: The script 'Arab' associated with language tag 'kby' does not match the script 'Latn' for the first language in the package.
soyombo.kpj - hint KM09011: The script 'Deva' associated with language tag 'sa-Soyo' does not match the script 'Soyo' for the first language in the package.
soyombo.kpj - hint KM09011: The script 'Tibt' associated with language tag 'bo-Soyo' does not match the script 'Soyo' for the first language in the package.
syriac_arabic.kpj - hint KM09011: The script 'Syrj' associated with language tag 'amw-Syrj' does not match the script 'Syrc' for the first language in the package.
syriac_phonetic.kpj - hint KM09011: The script 'Syrj' associated with language tag 'amw-Syrj' does not match the script 'Syrc' for the first language in the package.
sil_cameroon_azerty.kpj - hint KM09011: The script 'Bamu' associated with language tag 'bax' does not match the script 'Latn' for the first language in the package.
sil_cameroon_azerty.kpj - hint KM09011: The script 'Arab' associated with language tag 'mfi' does not match the script 'Latn' for the first language in the package.
sil_cameroon_qwerty.kpj - hint KM09011: The script 'Arab' associated with language tag 'mfi' does not match the script 'Latn' for the first language in the package.
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'amf-Latn' does not match the script 'Zyyy' for the first language in the package.
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'anu-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'bcq-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'bst-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'bwo-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'gmv-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'gof-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'gru-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'guk-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'gwd-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'hdy-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'kbr-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'kmq-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'kqy-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'ktb-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'lgn-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'mdx-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'mdy-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'moy-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'mpe-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'muz-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'myf-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'mym-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'nnj-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'nus' does not match the script 'Zyyy' for the first language in the package.      
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'she-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'sid' does not match the script 'Zyyy' for the first language in the package.      
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'so' does not match the script 'Zyyy' for the first language in the package.       
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'ssy' does not match the script 'Zyyy' for the first language in the package.      
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'suq-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'tsb-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'wal-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'wti-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'xom-Latn' does not match the script 'Zyyy' for the first language in the package. 
sil_el_ethiopian_latin.kpj - hint KM09011: The script 'Latn' associated with language tag 'zay-Latn' does not match the script 'Zyyy' for the first language in the package.
sil_ethiopic.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc' does not match the script 'Ethi' for the first language in the package.
sil_ethiopic.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'mul-Ethi' does not match the script 'Ethi' for the first language in the package.
sil_ethiopic_power_g.kpj - hint KM09011: The script 'Latn' associated with language tag 'kxc' does not match the script 'Ethi' for the first language in the package.
sil_ethiopic_power_g.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'mul-Ethi' does not match the script 'Ethi' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Arab' associated with language tag 'fub' does not match the script 'Latn' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Hebr' associated with language tag 'lad' does not match the script 'Latn' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Cyrl' associated with language tag 'sr' does not match the script 'Latn' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Runr' associated with language tag 'sxu' does not match the script 'Latn' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Hani' associated with language tag 'zhd' does not match the script 'Latn' for the first language in the package.
sil_euro_latin.kpj - hint KM09011: The script 'Hani' associated with language tag 'zlj' does not match the script 'Latn' for the first language in the package.
sil_hebr_grek_trans.kpj - hint KM09011: The script 'Cprt' associated with language tag 'grc-Latn' does not match the script 'Grek' for the first language in the package.
sil_hebr_grek_trans.kpj - hint KM09011: The script 'Hebr' associated with language tag 'he-Latn' does not match the script 'Grek' for the first language in the package.     
sil_hebr_grek_trans.kpj - hint KM09011: The script 'Hebr' associated with language tag 'hbo-Latn' does not match the script 'Grek' for the first language in the package.    
sil_indic_roman.kpj - hint KM09011: The script 'Beng' associated with language tag 'as-Latn' does not match the script 'Latn' for the first language in the package.
sil_indic_roman.kpj - hint KM09011: The script 'Beng' associated with language tag 'bn-Latn' does not match the script 'Latn' for the first language in the package.
sil_lisu_basic.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'lbc' does not match the script 'Lisu' for the first language in the package.
sil_lisu_standard.kpj - hint KM09011: The script 'Zyyy' associated with language tag 'lbc' does not match the script 'Lisu' for the first language in the package.
sil_nigeria_odd_vowels.kpj - hint KM09011: The script 'Arab' associated with language tag 'kby' does not match the script 'Latn' for the first language in the package.
sil_pan_africa_mnemonic.kpj - hint KM09011: The script 'Arab' associated with language tag 'fub' does not match the script 'Latn' for the first language in the package.
sil_pan_africa_positional.kpj - hint KM09011: The script 'Arab' associated with language tag 'fub' does not match the script 'Latn' for the first language in the package.

This is an indication that there may be an error in the language tag, but not necessarily conclusive, which is why it is reported only as a HINT.

@LornaSIL
Copy link
Contributor

LornaSIL commented Dec 2, 2024

Arbore keyboard - needed to update langtags to support arv-Latn-ET
gandhari keyboard - needed to update langtags to support Latn for 2 languages
deseret - needed to update langtags to support Deseret for Hopi

bu_phonetic - there is no und-Latn in langtags. We have und-Zyyy-001. Should we make an entry for und-Latn? @mcdurdin

2 of the basic keyboards (basic_kbdinuk2, basic_kbdiulat) support both Latin and Cans. The language tags are iu and iu-Latn. Do you have a recommendation how to handle those?

@mcdurdin
Copy link
Member

mcdurdin commented Dec 3, 2024

mul has no real script associated, so it gets the Zyyy "Code for undetermined script" tag by default. We should change mul to mul-Latn (or whichever script is appropriate), in our packages.

bu_phonetic - there is no und-Latn in langtags. We have und-Zyyy-001. Should we make an entry for und-Latn? @mcdurdin

Given und and mul can be associated with any script, I think we should not be adding all of those permutations to langtags.json. Instead, if we need to, we should probably have special handling in Keyman Developer for mul and und -- probably to say 'these should always have a script set'. (We wouldn't want to do this for all Zyyy langtags because there are hundreds of them where we have no known writing system.)

There is a separate problem with und-Latn -- it crashes any .NET apps in Windows, but we should be patching that with a workaround in Keyman for Windows, ideally. (keymanapp/keyman#10727)

2 of the basic keyboards (basic_kbdinuk2, basic_kbdiulat) support both Latin and Cans. The language tags are iu and iu-Latn. Do you have a recommendation how to handle those?

I think in these cases we just ignore the hint. That's why I made it a hint rather than a warning or error -- it may not be something we can fix. (Once we support keymanapp/keyman#10397, then we can disable the hint at a project level, which would be great.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

3 participants