Skip to content

Latest commit

 

History

History
39 lines (23 loc) · 1.67 KB

README.md

File metadata and controls

39 lines (23 loc) · 1.67 KB

English Dictionary

Tests

This is a minimally tested and incomplete parser of the Webster Unabriged English Dictionary from the modified GCIDE XML that categorizes content to make it easy to find and parse. I was doing a lot of research on finding a machine readable English dictionary for a project where I didn't want to rely on a third party API (e.g. Wordnik).

Generate Simple JSON

From the project directory, run the following:

ruby parse.rb

This will generate a JSON file for each GCIDE XML file. Each object key is a unique word and the value being an object containing the definitions (array of objects - definition, part of speech, field, and sequence). The files (excluding obsolete content) will contain ~99k unique words and ~160k definitions.

Resources

GCIDE

After reviewing all resources went first with parsing this GCIDE XML. The next best solution seems to be Wikitionary TSV.

Wikitionary TSV

Webster's Unabridged Dictionary (1913 - public domain)

Moby Word Lists