Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First steps towards a CG-based UD parser; point to the lexicon-proofreading-effort in the docs; some corrections in puupankki #16

Open
wants to merge 151 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
bb8e40f
add kaz-tagger mode to the Racket interface
IlnarSelimcan Feb 10, 2020
8b855f7
add <num><ord> reading to digits; select that reading if a proper nou…
IlnarSelimcan Feb 10, 2020
1b8267a
test kaz-morph's output too
IlnarSelimcan Feb 10, 2020
5b06b86
add some shitty dependency rule (for parsing the first sentence of kk…
IlnarSelimcan Feb 10, 2020
d2269e8
few more mapping and attachement rules (sents 1-3 of ud treebank cove…
IlnarSelimcan Feb 17, 2020
bd5a51a
remove ерекше<adv> since it doesn't make sense to add someting as bot…
IlnarSelimcan Feb 17, 2020
c4f90dd
add few more dependency parsing rules
IlnarSelimcan Feb 18, 2020
67b53b7
don't output in the kaz-tagger-deterministic (Racket) mode
IlnarSelimcan Feb 18, 2020
d24c55c
remove entries marked as Use/MT from the LR transducer (morphological…
IlnarSelimcan Feb 18, 2020
be5f9f8
add few more disambiguation rules
IlnarSelimcan Feb 18, 2020
13efd51
minor update in the docs: pipeline for getting UD output
IlnarSelimcan Mar 4, 2020
cf2679d
minor
IlnarSelimcan Mar 5, 2020
bf27ca6
comment out kaz-morph tests for now
IlnarSelimcan Mar 5, 2020
e493c86
update documentation a bit
IlnarSelimcan Mar 8, 2020
638c796
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan Mar 8, 2020
b0e3d6e
revert the change in which I was deleting lines marked as Use/MT from…
IlnarSelimcan Mar 8, 2020
b3d5b30
minor fix in the docs
IlnarSelimcan Mar 8, 2020
4abb038
get rid of kaz-morph tests as they are somewhat volatile atm
IlnarSelimcan Mar 8, 2020
f60e4fe
add commands for converting apertium-kaz's output into CoNLLU format
IlnarSelimcan Mar 9, 2020
1e8dd15
untabify the conllu example output in the docs
IlnarSelimcan Mar 9, 2020
0b4e427
untabify
IlnarSelimcan Mar 9, 2020
ea50d10
add two more disambiguation rules
IlnarSelimcan Mar 9, 2020
43c346a
add examples and clarifications on how we evaluate CG parser's output
IlnarSelimcan Mar 9, 2020
2852e54
fix a typo in the docs
IlnarSelimcan Mar 12, 2020
8a23e58
set the lemma of маңызды to маңызды (its POS tag is adj)
IlnarSelimcan Mar 12, 2020
9ce7f2f
set the lemma of маңызды to маңызды (its POS tag is adj)
IlnarSelimcan Mar 12, 2020
50582f6
treat санал as passive form of сана
IlnarSelimcan Mar 12, 2020
39ba759
add a todo note about spaceafter=no
IlnarSelimcan Mar 12, 2020
42a2b47
add few more CG rules
IlnarSelimcan Mar 12, 2020
a2d9b0a
change lemma of 'өсті (grow)' from өст to өс
IlnarSelimcan Mar 12, 2020
3d38bdc
s/ 'өс (grow)' transitive/to intransitive
IlnarSelimcan Mar 12, 2020
8717126
add hyperlinks to the GPLv3 and CC-BY 4.0
IlnarSelimcan Mar 13, 2020
c6c8a1f
change formatting of a table in the docs in minor ways
IlnarSelimcan Mar 13, 2020
a3bc3bd
add a note on whole treebank eval
IlnarSelimcan Mar 13, 2020
f2110ea
add few more cg rules
IlnarSelimcan Mar 13, 2020
326c9aa
minor
IlnarSelimcan Mar 14, 2020
492e74b
minor
IlnarSelimcan Mar 14, 2020
159c150
add a mixing dependency for the noun of a phrasal verb; s/amod/nummod…
IlnarSelimcan Mar 14, 2020
94685ca
add a note on how to add Zhenis' disambiguator into the pipeline
IlnarSelimcan Mar 14, 2020
c7d4a89
add few more CG rules
IlnarSelimcan Apr 2, 2020
b74f3af
add a few words about the UD treebanki into the docs
IlnarSelimcan Apr 2, 2020
3242f2d
fix a typo in the documentation
IlnarSelimcan Apr 5, 2020
518a3c4
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan Apr 5, 2020
71981c9
change lemma of саналады from санал to сана in puupankki
IlnarSelimcan Apr 5, 2020
c877060
add ud-scripts/conllu-nospaceafter.py into the pipeline and update stats
IlnarSelimcan Apr 6, 2020
cac8cfa
start keeping track of open questions about categorization
IlnarSelimcan Apr 7, 2020
806dfff
add few doubts
IlnarSelimcan Apr 7, 2020
e1a8dc8
add few doubts in the documentation
IlnarSelimcan Apr 7, 2020
f6c2bdb
add few more annotation doubts
IlnarSelimcan Apr 7, 2020
749b284
add a note on how to parse hand-tagged texts with udpipe
IlnarSelimcan Apr 7, 2020
4c45a9c
[annotation] minor fixes in puupankki
IlnarSelimcan Apr 28, 2020
ff2110c
[annotation] minor fixes in puupankki; add labels
IlnarSelimcan Apr 28, 2020
ced5b1d
[annotation] minor fixes in the UD treebank; mark sentences I've read…
IlnarSelimcan Apr 29, 2020
2b43e41
[documentation] keep track of open questions/doubts/смутные сомнения
IlnarSelimcan Apr 29, 2020
b37cce8
[annotation] minor fixes in the UD treebank; mark sentences I've read…
IlnarSelimcan Apr 30, 2020
438b92b
[annotation] minor fixes (well, я так думаю \!) in the UD treebank; m…
IlnarSelimcan May 4, 2020
346c598
[docs]
IlnarSelimcan May 8, 2020
fb38e49
[annotation] minor fixes in the UD treebank; converting into udv2 etc
IlnarSelimcan May 9, 2020
b876220
[docs]
IlnarSelimcan May 9, 2020
8d8a1d7
[docs] minor
IlnarSelimcan May 13, 2020
80854ce
[docs] minor: add a css style
IlnarSelimcan May 13, 2020
ffbef26
[documentation] build documentation on Github actions and thus stop t…
IlnarSelimcan May 13, 2020
7bf67b2
[documentation] run ./autogen.sh beforehand so that I can run 'make d…
IlnarSelimcan May 13, 2020
19b6694
[documentation] install Apertium core so that I can run ./autogen.sh …
IlnarSelimcan May 13, 2020
5ab0a71
[documentation] install Apertium core so that I can run ./autogen.sh …
IlnarSelimcan May 13, 2020
0d8d685
[documentation] minor, still fiddling with Github Actions config file
IlnarSelimcan May 13, 2020
5f0d88c
Deploying to master from @ 0d8d6859fb58a5c9a4f0f9e7ad9d785344be9821 🚀
IlnarSelimcan May 13, 2020
1eff62b
[documentation] abandon the idea of building the documentation on git…
IlnarSelimcan May 13, 2020
1ac86d2
Merge branch 'master' of github.com:taruen/apertium-kaz
IlnarSelimcan May 13, 2020
1b9de16
[annotation] minor fixes to make ud-tools/validation.py happy
IlnarSelimcan May 17, 2020
37f6c75
[documentation] re-organise; add a note on how to train a udpipe model
IlnarSelimcan May 18, 2020
d5a6372
[ud annotation] minor fixes to make the treebank ud version 2 conform…
IlnarSelimcan May 19, 2020
7095ccc
[ud annotation] minor fixes to make the treebank ud version 2 conform…
IlnarSelimcan May 20, 2020
7079f89
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan May 23, 2020
6eaa074
[ud annotation] minor fixes to make the treebank ud version 2 conform…
IlnarSelimcan May 23, 2020
8d567fd
[ud annotation] minor fixes to make the treebank ud version 2 conformant
IlnarSelimcan May 26, 2020
e0e3ac5
validate with ud-tools/validate.py
IlnarSelimcan May 26, 2020
28b5154
[ud annotation] minor fixes
IlnarSelimcan May 28, 2020
098c7e8
[ud-annotation] [docs] validate with ud-tools/validate.py
IlnarSelimcan May 28, 2020
7df24e6
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan May 30, 2020
d0e26a7
[ud annotation] minor fixes based on validate.py's output and also to…
IlnarSelimcan May 31, 2020
ab5d141
[docs] minor tweak so that it appears as a package
IlnarSelimcan Jun 1, 2020
77349a7
Revert "[ud annotation] minor fixes based on validate.py's output and…
IlnarSelimcan Jun 2, 2020
0c758e6
rewind to d0e26a77318aaf93dd93c165c1216f71796d2c41
IlnarSelimcan Jun 2, 2020
0570b4a
correcciones; <s>afegitons</s>
IlnarSelimcan Jun 4, 2020
be100e9
<s>docs</s> scribblings
IlnarSelimcan Jun 4, 2020
d0d8f8c
[ud annotation] correcciones
IlnarSelimcan Jun 6, 2020
8b7021a
[UD annotacion] add checked_IFS labels to sentences which IFS has che…
IlnarSelimcan Jun 6, 2020
4593da1
[UD annotacion] validacion partially
IlnarSelimcan Jun 6, 2020
40935a7
[ud annotation] correcciones
IlnarSelimcan Jun 20, 2020
303889b
[ud annotation] correcciones (I'd like to think so)
IlnarSelimcan Jun 21, 2020
2c8c9fc
cleanup the consequences of GA misfortune
IlnarSelimcan Jun 21, 2020
e586de9
cleanup the consequences of GA misfortune
IlnarSelimcan Jun 21, 2020
912086e
[docs] one more doubt about ud
IlnarSelimcan Jun 26, 2020
87c7d67
[docs] clarification to a doubt
IlnarSelimcan Jun 26, 2020
201a601
[docs] minor note
IlnarSelimcan Jul 3, 2020
a324938
[docs] minor fixes regarding ud annotation
IlnarSelimcan Jul 11, 2020
6673112
[docs] minor note
IlnarSelimcan Jul 18, 2020
b74a216
[ud] merge puupankki.kaz.conllu file from the UD_Kazakh-KTB-v2.7 branch
IlnarSelimcan Jul 20, 2020
40b1ee2
rename Лейнг_Рональд_Дэвис.txt to Лейнг_Рональд_Дэвис.tagged.txt sinc…
IlnarSelimcan Sep 15, 2020
c8ee0b2
point to pr17 instead pr16 for the revised puupankki-treebank
IlnarSelimcan Sep 15, 2020
c7d88e7
update puupankki.conllu to the version in the UD_v2.7 branch
IlnarSelimcan Sep 16, 2020
188586a
s/lt-proc/hfst-proc because of the following issue
IlnarSelimcan Sep 17, 2020
9903cbe
[docs] minor updates & clarifications
IlnarSelimcan Sep 17, 2020
d781a05
[docs] minor additions
IlnarSelimcan Sep 17, 2020
f09c3c3
Merge branch 'issue#20' of github.com:taruen/apertium-kaz
IlnarSelimcan Sep 18, 2020
a7aba36
switch back to lt-proc in kaz-morph and kaz-tagger, and now also in k…
IlnarSelimcan Sep 18, 2020
99108bc
[docs] update parsing eval
IlnarSelimcan Sep 18, 2020
cf63639
[docs] add stats for the case when using the extended lexicon
IlnarSelimcan Sep 18, 2020
8b2f6d5
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan Sep 19, 2020
26abf80
[data] put +e on a separate line as a subreading, as it shold've been…
IlnarSelimcan Sep 19, 2020
9aad41d
[data] s/+е/"е"/g
IlnarSelimcan Sep 19, 2020
47a9d86
s/мен post/мен cnjcoo/g here too (i.e. so that it matches puupankki),…
IlnarSelimcan Sep 25, 2020
5773c33
eval scripts
IlnarSelimcan Sep 25, 2020
8fe0578
revert
IlnarSelimcan Sep 25, 2020
879f3cf
[ud] add two more lines for converting the хрглораца adv attr into ADJ
IlnarSelimcan Sep 26, 2020
f88c232
[ud] make the lemmas of қазіргі consistent (although I'm not too happ…
IlnarSelimcan Sep 26, 2020
7586867
[ud] minor fix
IlnarSelimcan Sep 26, 2020
3f9bea4
[ud] жүзеге асыр -> жүзе N Case=Dat obl. Can be changed easily into s…
IlnarSelimcan Sep 26, 2020
e2e1820
fix my own typo жүзеге s/N/n/ all over the place
IlnarSelimcan Sep 26, 2020
8b5ccd2
[ud] in атап өтті, make атап the root and өтті aux of it, consistently
IlnarSelimcan Sep 26, 2020
6b08c41
[ud] s/VerbForm=Cov/VerbForm=Conv/g
IlnarSelimcan Sep 26, 2020
dcc0e9e
[ud] make барлық consistently DET QNT instead of arbitrarily DET.QNT …
IlnarSelimcan Sep 26, 2020
f9bfda7
[ud puupankki] minor fixes
IlnarSelimcan Sep 26, 2020
2bf5d7b
[docs] complete installation and usage notes
IlnarSelimcan Sep 27, 2020
2dc4b95
[docs] when compiling for taruen.com (with DOCSFOR=TARUEN make docs),…
IlnarSelimcan Sep 27, 2020
c9f4e2e
[ud puupankki] жаттығу n -> v ger
IlnarSelimcan Sep 27, 2020
19469e7
[ud puupankki] s/жаттық v.get/жаттығу n/; орталықтың езгісіне s/Case=…
IlnarSelimcan Sep 27, 2020
051cd83
[ud puupankki] 2-жартысында s/adj/n/
IlnarSelimcan Sep 27, 2020
a67b942
[ud puupankki] s/қат v.coop/қатыс v not coop/ = to participate
IlnarSelimcan Sep 27, 2020
3fb8b59
[ud] convert сондықтан cnjadv not as SCONJ but as ADV, because in puu…
IlnarSelimcan Sep 27, 2020
933a21e
add some work-in-progress cg3 files for ~retokenizaion (with ud in mi…
IlnarSelimcan Sep 30, 2020
816f565
[docs] fix a typo
IlnarSelimcan Oct 1, 2020
e47149d
[ud puupankki] fix a type in feature name
IlnarSelimcan Oct 14, 2020
47f538a
[ud puupankki] fix https://github.com/apertium/apertium-kaz/issues/25
IlnarSelimcan Oct 14, 2020
ad8e63f
[ud puupankki] s/жеңістерге жетіңізn<num>/жеңістерге жетіңіз<v><imp>
IlnarSelimcan Oct 18, 2020
b92b526
[ud puupankki] бүгінгі is ADJ adv attr; оңтүстік-шығысы is a NOUN n a…
IlnarSelimcan Oct 19, 2020
965aa41
[ud puupankki] миллион s/NOUN/NUM/g'
IlnarSelimcan Oct 20, 2020
11ccc05
[ud puupankki] handle 4.3 мың as ->compound, just like any other comp…
IlnarSelimcan Oct 23, 2020
1326f10
[ud puupankki] тығыз топтасқан. тығыз is adj.advl here
IlnarSelimcan Oct 23, 2020
53f8d1c
[ud puupankki] тарихымызға -> келуге
IlnarSelimcan Oct 23, 2020
3cfbded
[ud puupankki] s/М. NOUN abbr/М. PROPN np Case=Nom/ etc
IlnarSelimcan Oct 23, 2020
16a803e
[ud] convert adj advl as ADV; s/VerbForm=Cov/VerbForm=Conv
IlnarSelimcan Oct 23, 2020
891aced
[ud puupankki] minor fixes
IlnarSelimcan Oct 23, 2020
c45a9ce
[ud puupankki] minor fixes
IlnarSelimcan Oct 23, 2020
27464fe
[ud puupankki] minor fixes
IlnarSelimcan Oct 23, 2020
7bcfff7
[ud puupankki] minor fix: tag -GAн vadjes as acl:relcl consistently, …
IlnarSelimcan Oct 30, 2020
f2b0f4a
remove the empty file puupankki.conllx
IlnarSelimcan Nov 1, 2020
a19bb8a
label X айы as nmod:poss consistently, and not compound in one place …
IlnarSelimcan Nov 1, 2020
b0c4e0e
Neues aus der Werkstatt
IlnarSelimcan Nov 2, 2020
ebc1ad7
Merge remote-tracking branch 'upstream/master'
IlnarSelimcan Apr 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apertium-kaz.kaz.lexc
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,7 @@ LEXICON NUM-DIGIT
%<num%>%<subst%>:%- SUBST-NONOM ;
%<num%>%<subst%>: SUBST-NONOM ; ! Dir/LR

%<num%>%<ord%>: # ;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020 in the phrase "Еуровидение 2020" receives <num><ord> in the UD treebank, hence this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, @ftyers, what do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok.

%<num%>%<ord%>:%-%{I%}нш%{I%} # ;
%<num%>%<ord%>%<subst%>%<nom%>:%-%{I%}нш%{I%} # ;
%<num%>%<ord%>%<subst%>:%-%{I%}нш%{I%} SUBST-NONOM ;
Expand Down Expand Up @@ -9994,7 +9995,6 @@ retroactive:retroactive A1 ; !"Use/MT"
ере% жүр:ере% жүр V-IV ; ! "Use/MT"
ерек:ерек N1 ; !
ерекше:ерекше A1 ; ! "certain, special"
ерекше:ерекше ADV ; ! ""
ерекшеле:ерекшеле V-TV ; ! ""
ерекшелен:ерекшелен V-IV ; ! ""
ерекшелік:ерекшелік N1 ; ! ""
Expand Down
Loading