Skip to content

A list of ~ 90,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus some methods to query the data.

Notifications You must be signed in to change notification settings

wchoston/german_nouns

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German nouns

A comma seperated list of ~ 90 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus some methods to query the data.

The list is located here: german_nouns/nouns.csv

Usage

Clone repository and install requirements

git clone https://github.com/gambolputty/german_nouns
bash setup.sh

Create CSV file from Wiktionary dump

Data is saved in german_nouns/nouns.csv.

cd german_nouns
python -m create_csv /path-to-dump-file/dewiktionary-latest-pages-articles-multistream.xml.bz2

Query the CSV file

All examples in here (command: cd german_nouns && python -m query):

from query.NounDictionary import NounDictionary

nouns = NounDictionary('../nouns.csv')

# Lookup a word
word_entry = nouns['Fahrrad']
pprint(word_entry)
# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# get the last word of a compound
last_word = nouns.last_word('Falkenstein')
print(last_word)
# Output:
[{'flexion': {'akkusativ plural': 'Steine',
              'akkusativ singular': 'Stein',
              'dativ plural': 'Steinen',
              'dativ singular': 'Stein',
              'dativ singular*': 'Steine',
              'genitiv plural': 'Steine',
              'genitiv singular': 'Steins',
              'genitiv singular*': 'Steines',
              'nominativ plural': 'Steine',
              'nominativ singular': 'Stein'},
  'genus': 'm',
  'lemma': 'Stein',
  'pos': ['Substantiv']}

# parse compound word
compound_words = nouns.parse_compound('Vermögensbildung')
print(compound_words)
# Output:
['vermögen', 'bildung'] # Lookup the words: nouns['vermögen'] etc.

List compiled from WiktionaryDE with wiktionary_de_parser.

License: Creative Commons Attribution-ShareAlike 3.0 Unported.

About

A list of ~ 90,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus some methods to query the data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%