Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support correction for shuangpin. #90

Merged
merged 14 commits into from
Dec 22, 2024
Merged

Support correction for shuangpin. #90

merged 14 commits into from
Dec 22, 2024

Conversation

bennyyip
Copy link
Contributor

测试过纠错是能工作的。
但是目前的实现有问题,无脑在buildShuangpinTable的时候指定了qwerty的纠错。
无奈不熟悉libime,不知道该怎么改了

image

@bennyyip
Copy link
Contributor Author

加一个 ShuangpinProfile::setCorrectionProfile(),在 PinyinEncoder::parseUserShuangpin() 里面调用。
如果 correctionProfile 和之前 ShuangpinProfile里面的不一样,就重新 buildShuangpinTable()

这样实现可以吗?

@wengxt
Copy link
Member

wengxt commented Dec 13, 2024

你的实现只实现了需要做的很小的一部分

首先,双拼用的 pinyin map 用默认的即可。你要做的双拼的 correction,不是一个用了全拼 correction 的 pinyin map 再映射,这是不对的,而且只会让 correction 只支持单字母的声母或者韵母组合。

应该在 buildShuangpinTable 的时候,利用 CorrectionProfile 的内容增加对应的条目。

1、你把 PinyinCorrectionProfile 的最原始的 mapping 保存成为一个属性暴露出来可以让别的类读取
2、ShuangpinProfile 的 constructor 增加一个 optional 的 PinyinCorrectionProfile*
3、在 buildShuangpinTable 里面,做类似

std::vector<PinyinEntry> newEntries;
的事情,构建一个带 correction 的 shuangpin table

@bennyyip
Copy link
Contributor Author

改 ShuangpinProfile 的 ctor 的话,如果 Chinese addon 里面改了correctionProfile,还得去重新构造 ShuangpinProfile。
所以我一开始想的是像全拼那样在 parseUserShuangpin 处理

你说的第二步改成这样行不行:
增加 ShuangpinProfile::setCorrectionProfile(PinyinCorrectionProfile*),在 parseUserShuangpin 的时候调用,在里面更新 shuangpinTable

@wengxt
Copy link
Member

wengxt commented Dec 13, 2024

构造这个又不费什么cpu,不用省。

@bennyyip
Copy link
Contributor Author

其实不是考虑性能,两种实现都是纠错改了才要构建。只是想要和全拼的实现更一致,不用再去改Chinese addon。
那我就按你说的做了,明天看看能不能提一个PR

@wengxt
Copy link
Member

wengxt commented Dec 13, 2024

其实不是考虑性能,两种实现都是纠错改了才要构建。只是想要和全拼的实现更一致,不用再去改Chinese addon。 那我就按你说的做了,明天看看能不能提一个PR

检测使用的 CorrectionProfile 是否和上次一样,是更加复杂的逻辑,逻辑越复杂就越容易出 bug
如果要检测到时候你打算怎么检测是否一样呢?比较内容本身那就只能比较 spTable ,如果比较指针那有可能出现同一个地址被 reuse 成不同的内容,要么就得把它套成 shared_ptr ,但无论哪种都让逻辑更加复杂

这几个 profile 从一开始就设计为 immutable ,所以逻辑本身很简单,有任何需要生成的数据只生成一次。

@bennyyip
Copy link
Contributor Author

在shuangpinprofile里加一个BuiltinPinyinCorrectionProfile类型的成员,比较这个enum,如果变了根据它就重新buildShuangpinTable。
这样不可以吗?是我没弄清楚?

@wengxt
Copy link
Member

wengxt commented Dec 13, 2024

在shuangpinprofile里加一个BuiltinPinyinCorrectionProfile类型的成员,比较这个enum,如果变了根据它就重新buildShuangpinTable。 这样不可以吗?是我没弄清楚?

enum 是内置的几个方便的值,简单来说只是一个 shortcut,PinyinCorrectionProfile 本身的定义是可以任意写的

参见这个构造函数。

explicit PinyinCorrectionProfile(

@bennyyip
Copy link
Contributor Author

明白了,谢谢

@bennyyip
Copy link
Contributor Author

写好了,麻烦看一下 @wengxt
这个PR也要合:fcitx/fcitx5-chinese-addons#204
image

wengxt
wengxt previously requested changes Dec 16, 2024
@@ -47,6 +47,7 @@ getProfileMapping(BuiltinPinyinCorrectionProfile profile) {
class PinyinCorrectionProfilePrivate {
public:
PinyinMap pinyinMap_;
std::unordered_map<char, std::vector<char>> correctionMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name member with underscore

@@ -183,6 +184,12 @@ void PinyinIME::setCorrectionProfile(
FCITX_D();
if (d->correctionProfile_ != profile) {
d->correctionProfile_ = std::move(profile);
auto sp = shuangpinProfile();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this part to chinese-addons, so you don't need to change here and keep a copy of the original version.

src/libime/pinyin/pinyincorrectionprofile.h Show resolved Hide resolved
private:
void buildShuangpinTable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the symbol is private, I'd prefer not to change signature.

Can you just don't touch it (you can delete everything in it's implementation), and create a new function as ShuangpinProfilePrivate::buildShuangpinTable?

This is mainly an issue on ABI compatiblity.

test/testpinyinencoder.cpp Show resolved Hide resolved
@bennyyip
Copy link
Contributor Author

testpinyinime_unit.cpp 运行的日志

10: Test command: /home/benyip/ghq/github.com/fcitx/libime/build/test/testpinyinime_unit
10: Working Directory: /home/benyip/ghq/github.com/fcitx/libime/build/test
10: Test timeout computed to be: 10000000
10: D2024-12-17 09:11:28.530681 decoder.cpp:396] Build Lattice: 1
10: D2024-12-17 09:11:28.533264 decoder.cpp:398] Forward Search: 3
10: D2024-12-17 09:11:28.593725 decoder.cpp:400] Backward Search: 64
10: D2024-12-17 09:11:28.596226 decoder.cpp:396] Build Lattice: 1
10: D2024-12-17 09:11:28.597910 decoder.cpp:398] Forward Search: 3
10: D2024-12-17 09:11:28.604679 decoder.cpp:400] Backward Search: 10
10: D2024-12-17 09:11:28.606485 decoder.cpp:396] Build Lattice: 1
10: D2024-12-17 09:11:28.608280 decoder.cpp:398] Forward Search: 3
10: D2024-12-17 09:11:28.615547 decoder.cpp:400] Backward Search: 10
10: D2024-12-17 09:11:28.626062 decoder.cpp:396] Build Lattice: 1
10: D2024-12-17 09:11:28.627819 decoder.cpp:398] Forward Search: 3
10: D2024-12-17 09:11:28.634972 decoder.cpp:400] Backward Search: 10
10: 痭 琕 棅 庰 屛 癛 傡 偋 併 䴵 䗒 䔊 䓑 䋑 䈂 㨀 餠 餅 鞆 陃 怲 鋲 鈵 誁 苪 栤 眪 窉 靐 絣 稟 倂 抦 並 昺 栟 氷 摒 寎 竝 鮩 邴 枋 炳 檳 幷 梹 槟 屏 禀 柄 秉 昞 鞞 饼 仒 冫 丙 冰 兵 仌 病 冰碛 屏弃 兵棋 㓈 掤 并起 蛃 冰期 并 病气 鞸 綆 屏气 垪 摒弃 兵器 屏弃了 鉼 摒弃了
10: F2024-12-17 09:11:28.635211 testpinyinime_unit.cpp:81] c.candidateSet().count("冰淇淋") failed.
1/1 Test #10: testpinyinime_unit ...............Subprocess aborted***Exception: 1.05 sec
D2024-12-17 09:11:28.530681 decoder.cpp:396] Build Lattice: 1
D2024-12-17 09:11:28.533264 decoder.cpp:398] Forward Search: 3
D2024-12-17 09:11:28.593725 decoder.cpp:400] Backward Search: 64
D2024-12-17 09:11:28.596226 decoder.cpp:396] Build Lattice: 1
D2024-12-17 09:11:28.597910 decoder.cpp:398] Forward Search: 3
D2024-12-17 09:11:28.604679 decoder.cpp:400] Backward Search: 10
D2024-12-17 09:11:28.606485 decoder.cpp:396] Build Lattice: 1
D2024-12-17 09:11:28.608280 decoder.cpp:398] Forward Search: 3
D2024-12-17 09:11:28.615547 decoder.cpp:400] Backward Search: 10
D2024-12-17 09:11:28.626062 decoder.cpp:396] Build Lattice: 1
D2024-12-17 09:11:28.627819 decoder.cpp:398] Forward Search: 3
D2024-12-17 09:11:28.634972 decoder.cpp:400] Backward Search: 10
痭 琕 棅 庰 屛 癛 傡 偋 併 䴵 䗒 䔊 䓑 䋑 䈂 㨀 餠 餅 鞆 陃 怲 鋲 鈵 誁 苪 栤 眪 窉 靐 絣 稟 倂 抦 並 昺 栟 氷 摒 寎 竝 鮩 邴 枋 炳 檳 幷 梹 槟 屏 禀 柄 秉 昞 鞞 饼 仒 冫 丙 冰 兵 仌 病 冰碛 屏弃 兵棋 㓈 掤 并起 蛃 冰期 并 病气 鞸 綆 屏
气 垪 摒弃 兵器 屏弃了 鉼 摒弃了
F2024-12-17 09:11:28.635211 testpinyinime_unit.cpp:81] c.candidateSet().count("冰淇淋") failed.

@bennyyip
Copy link
Contributor Author

测试也修好了,麻烦看一下能合进去了吗 @wengxt

@wengxt wengxt merged commit df00a0b into fcitx:master Dec 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants