Skip to content

Commit

Permalink
(#53) bootstrap scripts
Browse files Browse the repository at this point in the history
- add splitting of `W31` in delaf
- add script for freeling
- update upstream-problems.org
  • Loading branch information
odanoburu committed May 10, 2018
1 parent 40f992c commit 8a48cbe
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 7 deletions.
21 changes: 15 additions & 6 deletions tools/prepare-delaf.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
#!/bin/bash
# first argument is path to delaf dictionary

# don't forget to change CRLF to LF!
# tr -d '\r' < Delaf2015v04.dic > delaf.dic

function splitW31 {
sed "s/\(.*W\)31$/\13s\n\11s/"
}

# adjectives
grep -F ".A:" $1 > delaf.adj

Expand All @@ -13,13 +19,16 @@ grep -F ".N:" $1 | # select nouns
#remove uppercase lemmas (Gloria)
grep -v ",[A-Z]" > delaf.nouns

# simple verbs
grep -F ".V:" $1 | # select simple verbs
sed "s/:,/,/" | # rm entries like mantinhas:,manter.V:I2s
# sorri,sorrir.V:Y2S -> sorri,sorrir.V:Y2s
sed "s/2S$/2s/" > delaf.verbs
# # simple verbs
grep -F ".V:" $1 | # select simple verbs
sed "s/:,/,/" | # rm entries like mantinhas:,manter.V:I2s
# # sorri,sorrir.V:Y2S -> sorri,sorrir.V:Y2s
sed "s/2S$/2s/" | # split entries like abstrair,abstrair.V:W31
splitW31 > delaf.verbs

# verbs with clitics
grep -F ".V+PRO:" $1 | # select verbs with clitics
# rm spurious colon like in abstinhas:-lhe,abster.V+PRO:I2s
sed "s/:-/-/" > delaf.clitics
sed "s/:-/-/" |
splitW31 > delaf.clitics

17 changes: 17 additions & 0 deletions tools/prepare-freeling.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash
# first argument if path to directory where freeling dictionary files
# are, with their original names

# adjectives
# change C as diminutive tag to D (as is in nouns)
sed "s/ AQC/ AQD/" adjs > fl.adjectives

# adverbs
mv adv fl.adverbs

# nouns
# correct nouns with wrong C tag such as habeas-corpus
sed "s/ NCMC/ NCMN/" nouns > fl.nouns

# verbs
mv verbs fl.verb
3 changes: 2 additions & 1 deletion tools/upstream-problems.org
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@
subjuntive instead of singular number:
: sorri,sorrir.V:Y2S
- several hundred forms of infinitive had been marked as having two
persons, where these should have been in two lines instead:
persons, where these should have been in two lines instead, and
should have a number tag:
: abstrair,abstrair.V:W31
- several hundred forms missing hifen:
: protrairnos protrair+V.None+SBJF+1+SG
Expand Down

0 comments on commit 8a48cbe

Please sign in to comment.