部首のバグ修正と改善 #1169

faketanaka · 2025-01-31T03:45:56Z

Description

バグ修正
- 変換候補の部首について，漢字を部首に置換 (「鬥」のコメントについて、および変換候補に部首を表示させる際のコメントについて #653 )
- 一部の異なる部首を混同していたのを修正（つきとにくづき，よこめとあみめなど）
機能の改善
- Unicodeのコードポイントを参考に順番を変更
  - Unicodeの部首のコードポイントは基本的に康煕字典に準拠しています
- 候補に表示されるコメントの改善
- 読みがなの追加

Issue IDs

google-cla · 2025-01-31T03:46:00Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

faketanaka · 2025-01-31T03:48:19Z

CLAについてですが，当方Googleアカウントを保有していません．なにかできることはございますか．

hiroyuki-komatsu · 2025-02-03T07:30:53Z

Thank you for sending the PR.

CLA is a required process to accept PRs due to the repository policy.
I'm not sure if you can sign CLA without Google account.

Thank you for your cooperation.

faketanaka · 2025-02-03T10:01:57Z

わかりました。Googleアカウントを作成しますのでお待ちいただければ幸いです。

所属組織のGitHubアカウントで返信してしまったため再投稿です。

faketanaka · 2025-02-07T15:43:55Z

Googleアカウントにログインし、CLAに同意いたしましたのでご報告いたします。

hiroyuki-komatsu · 2025-02-12T07:09:08Z

Hi faketanaka,
Thank you for your PR.

I agree that there is a room to improve radical related conversions as discussed in #653. On the other hand, a number of radical characters look quite similar to normal characters. For example, ⾚ (U+2F9A) as a Kangxi radical (康煕部首) looks identical to 赤 (U+8D64) as a normal kanji.

Therefore, we would like to avoid confusing users from two similar characters, even though the radical one has some descriptions. Some environments may not show those descriptions.

So we would like to avoid too generic readings for radical characters (e.g. あか for ⾚ [U+2F9A]). Please remove those readings from radical characters and keep readings that the users can intentionally type radical characters.

faketanaka · 2025-02-12T09:41:55Z

So we would like to avoid too generic readings for radical characters (e.g. あか for ⾚ [U+2F9A]). Please remove those readings from radical characters and keep readings that the users can intentionally type radical characters.

これは変換候補の右側に表示されるコメントのことではなく、「あか」と入力して部首のあかが表示されるのは混乱を招くため、候補を出すためのキーワードを改善する必要がある、ということでしょうか。

私はこの問題に対する対応は #1165 で十分であると考えておりました。具体的にどのような入力で表示されるのがいいか、なにか案はないでしょうか。例えば「ぶしゅのあか」で候補に出すといったことが考えられますが、「ぶしゅのあか」というキーワードが適切かどうかは私一人では判断できかねます。

hiroyuki-komatsu · 2025-02-13T05:30:08Z

"ぶしゅのあか" sounds good to me too. Please be informed that I possibly ask you other ways again during the internal review process.

私はこの問題に対する対応は #1165 で十分であると考えておりました。

Indeed, I should have noticed this confusion beforehand while reviewing in #1165.

tats-u · 2025-03-20T14:33:47Z

康煕部首・CJK部首補助ブロックは日本の規格が元になったものではないはずです。
中国の規格がベースなので、日本語のIMEが漢字ブロックに入っている部首漢字を出すのは合理的です。
あと康煕部首・CJK部首補助ブロックは混乱の原因なので通常のユーザから隠してしまうのも一理あります。
日本語フォントの対応も康煕部首・CJK部首補助ブロックはそこまで良好ではないと思います。

faketanaka · 2025-03-22T10:44:53Z

うーん、Unicodeに収録されている文字がどの文字セット規格に由来するのかは把握しておりませんが、Mozcが公式にサポートしているOSは基本的にUTF-8ロケールであり、Unicode前提・Unicode中心に考えるのが適当であるかと思います。（Unicode以外の日本の文字コードも考慮にいれるのはよいことですが、あくまでもレガシーな文字セットとみなして、Unicode中心という考えに抵触しない範囲でやるべきと考えています）

日本の文字セット規格に由来しない文字を出さないとなると、「¥（半角円記号）を候補から隠してバックスラッシュのみを出すべきである」だとか、「日本以外の文字コードに由来するUnicodeの原規格分離を無視すべきである」、といった話になりかねず、UTF-8が標準となったこんにちにおいて時代錯誤であると言わざるをえません。

faketanaka · 2025-03-22T10:46:14Z

私としては、最終的にはメンテナーのkomatsuさんの判断におまかせします。

tats-u · 2025-03-22T11:48:42Z

一応日本語新字体の字形もいくつかありますね。失礼しました。

https://techracho.bpsinc.jp/hachi8833/2020_10_07/95257
https://tama-san.com/resolve-kanji/

検索性が最悪で普通の漢字で探しても出ない点が日本語の文字として新たに迎え入れるうえでちょっと困るかなと思います。

フォント対応を調べましたが、以下のフォントには対応していないことを確認しました。BIZ UD系は個人的に多用しているので、伝統的なフォント群に対応していないことも含めてあまり使いたくはないかなと思います。

ＭＳゴシック・明朝系（メイリオ系・游系は対応）
BIZ UDゴシック・明朝系（UDデジタル教科書体は対応）
HG系フォント全般（Office付属）
IPAmj明朝（他のIPAフォントは不明）

Google Fontsの日本語フォントでは半々程度です。

https://fonts.google.com/?preview.text=%E2%BB%AF%20%E7%AB%9C&lang=ja_Jpan

対応しているフォントは元の漢字とグリフが99%以上同じなので余計に存在に気づきにくく、無理して使い分けなくてもいいのかな・・・という気がします。
封印を解く、もといパンドラの箱を開けるかどうかは小松さん次第ですが。

tats-u · 2025-03-22T12:30:31Z

別に今まで通り漢字を部首代わりに使ってもいいのかな検索性もいいしどのフォントにも基本対応しているしどうせ字形は同じだし・・・という考えです。

hiroyuki-komatsu · 2025-03-24T06:25:18Z

Here's my thoughts.

Characters introduced from non Japanese standards (e.g. 康煕部首・CJK部首補助ブロック)

It's fine to use them if they are widely available among various platforms. So font coverage is one of the major factors (although it's not a gating factor).

Visual ambiguity between normal Kanji characters and radicals.

As I commented in this thread, I'm also afraid this ambiguity. So the readings of radicals should not be generic.
#1169 (comment)

If we introduce radicals, we should let the users clearly understand that they are typing radicals but not normal Kanji characters. So I suggested to append the specific prefix (ぶしゅの) to each reading.

I will discuss it with other people in the team, and I would change my statement later.
Thank you for understanding.

tats-u · 2025-03-27T03:33:15Z

Should we reserve the current radical kanji candidates with the "部首" in the candidate descriptions replaced with "部首漢字", and replace the "部首" in those for the new radical dedicated characters with "部首専用"?

faketanaka · 2025-03-27T08:57:48Z

Should we reserve the current radical kanji candidates with the "部首" in the candidate descriptions replaced with "部首漢字", and replace the "部首" in those for the new radical dedicated characters with "部首専用"?

「部首漢字」と「部首専用」をそれぞれ両方表示するのはいくらなんでも意図がエンドユーザーに伝わらないでしょう。難しすぎます。

faketanaka · 2025-03-27T08:59:07Z

一応日本語新字体の字形もいくつかありますね。失礼しました。

そもそも「康煕字典の部首」という概念は漢字文化圏全体で共通のものであることを認識していただく必要があります。

faketanaka · 2025-03-27T09:04:15Z

一応日本語新字体の字形もいくつかありますね。失礼しました。

そもそも「康煕字典の部首」という概念は漢字文化圏全体で共通のものであることを認識していただく必要があります。

現代の日本において、新字体は漢字の字形として一般的ですが、漢和辞典の部首索引においては一般的ではありません。部首索引においては今なお康煕字典のかたちの部首が事実上の標準とされ、新字体の部首は括弧書きで書かれる（康煕字典の部首が主で新字体の形状の部首が従）ことがよくある、ということです。康煕字典の漢字を外国語のものと認識されては困ります。

tats-u · 2025-03-27T09:56:08Z

簡体字と旧字体のグリフは前から確認していましたが、改めて日本の新字体を確認したというだけです。

hiroyuki-komatsu · 2025-04-07T08:47:04Z

I have discussed this topic with the team. Thank you for waiting for the update.

Two concerns were raised as discussed here.

Radical characters are confusing from normal characters
It is not clear how many users may use this feature, although there is the concern of confusion.

Since the user dictionary is the feature to cover this kind of requirement, here are our conclusions.

At this moment, we'd suggest using the user dictionary feature to input radical characters.
- If the user dictionary does not work, please let us know.
Please feel free to create a new entry in the Discussions to see feedback from other users.
- If you create user dictionary entries, sharing it in the discussion is helpful for the users who need it.

Although we don't apply this PR at this moment, I really appreciate your contribution.
Thank you,

faketanaka added 4 commits January 31, 2025 02:42

漢字を部首に置換

5df75f6

Unicodeを参考に順番を修正

68cd1d6

読みを改善

7d1b96f

読みと説明を改善

d630ee2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

部首のバグ修正と改善 #1169

部首のバグ修正と改善 #1169

faketanaka commented Jan 31, 2025 •

edited

Loading

google-cla bot commented Jan 31, 2025

faketanaka commented Jan 31, 2025

hiroyuki-komatsu commented Feb 3, 2025

faketanaka commented Feb 3, 2025

faketanaka commented Feb 7, 2025

hiroyuki-komatsu commented Feb 12, 2025

faketanaka commented Feb 12, 2025

hiroyuki-komatsu commented Feb 13, 2025

tats-u commented Mar 20, 2025 •

edited

Loading

faketanaka commented Mar 22, 2025

faketanaka commented Mar 22, 2025

tats-u commented Mar 22, 2025 •

edited

Loading

tats-u commented Mar 22, 2025

hiroyuki-komatsu commented Mar 24, 2025

tats-u commented Mar 27, 2025 •

edited

Loading

faketanaka commented Mar 27, 2025

faketanaka commented Mar 27, 2025

faketanaka commented Mar 27, 2025

tats-u commented Mar 27, 2025

hiroyuki-komatsu commented Apr 7, 2025

部首のバグ修正と改善 #1169

Are you sure you want to change the base?

部首のバグ修正と改善 #1169

Conversation

faketanaka commented Jan 31, 2025 • edited Loading

Description

Issue IDs

google-cla bot commented Jan 31, 2025

faketanaka commented Jan 31, 2025

hiroyuki-komatsu commented Feb 3, 2025

faketanaka commented Feb 3, 2025

faketanaka commented Feb 7, 2025

hiroyuki-komatsu commented Feb 12, 2025

faketanaka commented Feb 12, 2025

hiroyuki-komatsu commented Feb 13, 2025

tats-u commented Mar 20, 2025 • edited Loading

faketanaka commented Mar 22, 2025

faketanaka commented Mar 22, 2025

tats-u commented Mar 22, 2025 • edited Loading

tats-u commented Mar 22, 2025

hiroyuki-komatsu commented Mar 24, 2025

Characters introduced from non Japanese standards (e.g. 康煕部首・CJK部首補助ブロック)

Visual ambiguity between normal Kanji characters and radicals.

tats-u commented Mar 27, 2025 • edited Loading

faketanaka commented Mar 27, 2025

faketanaka commented Mar 27, 2025

faketanaka commented Mar 27, 2025

tats-u commented Mar 27, 2025

hiroyuki-komatsu commented Apr 7, 2025

faketanaka commented Jan 31, 2025 •

edited

Loading

tats-u commented Mar 20, 2025 •

edited

Loading

tats-u commented Mar 22, 2025 •

edited

Loading

tats-u commented Mar 27, 2025 •

edited

Loading