corpusDenominator

This is a small program written to bring corpora in the same language but in different orthographies to common orthographic denominator. It creates a “deformed” but orthographically uniform corpus for stylometric analysis with R (https://github.com/computationalstylistics/stylo).

Parameters in `corpusDenominator.py`

# Define parameters here

separator = "\t" # you can change the separator here ("\t" for TAB, "," for COMMA, etc.)
schemeFile = "conversionList.txt" # you can change the file name of the scheme
key = "RE" # use "RE" for regular expressions, "PLAIN" for simple find/replace

# Folder variables

folderOld = "./textsOld/" # folder for texts in old orthography
folderNew = "./textsNew/" # folder for texts in new orthography
folderMod = "./textsMod/" # folder for texts in mod orthography

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

corpusDenominator

Parameters in `corpusDenominator.py`

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
textsMod		textsMod
textsNew		textsNew
textsOld		textsOld
README.md		README.md
conversionList.txt		conversionList.txt
corpusDenominator.py		corpusDenominator.py

maximromanov/corpusDenominator

Folders and files

Latest commit

History

Repository files navigation

corpusDenominator

Parameters in corpusDenominator.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Parameters in `corpusDenominator.py`

Packages