Skip to content
forked from nsantacruz/PSHAT

(Pronounced "P'Shot") Part of Speech Handling for Aramaic Talmud

Notifications You must be signed in to change notification settings

erelsgl-nlp/PSHAT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PSHAT

(Pronounced "P'Shot") Part of Speech Handling for Aramaic Talmud

This is the official repo for Noah's Master's thesis.

This project aims to fill the gaping whole in ancient Aramaic POS tagging. Astonishingly, this field of research is scant. My work begins to show that modern machine learning techniques can learn patterns syntactic patterns in Talmud, despite two major issues

  1. Talmud has no punctuation. Because of this, it can be very difficult to break up sentences and ideas, even if one is familiar with the Aramaic and the structure of the text

  2. Talmud is actually a mix of two languages, Mishnaic Hebrew and Talmudic Aramaic. While in some places the distinction between these languages is clearly marked, the majority of Talmud is a mixture of the two.

Despite these issues, LSTMs were able to achieve above 90% POS tagging on a validation set.

I gratefully thank [CAL] (http://cal1.cn.huc.edu/) and especially Steve Kaufman for working with me on this project. The use of his dataset was crucial and his help working with the dataset was just as important.

About

(Pronounced "P'Shot") Part of Speech Handling for Aramaic Talmud

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.9%
  • Python 0.1%