Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative readbility metrics #3

Open
thisisnic opened this issue Jul 20, 2021 · 0 comments
Open

Alternative readbility metrics #3

thisisnic opened this issue Jul 20, 2021 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@thisisnic
Copy link
Owner

thisisnic commented Jul 20, 2021

Currently docreview uses Flesch-Kincaid to analyse the vignettes. However, there may be other metrics to use as well as or instead of which can provide more useful analyses.

Factors identified by Pitler and Nenkova (2008):

LogL of Discourse Relations (r = .4835) nope!
LogL, NEWS (r= .4497) - log likelihood of an article based on a source; more likely an article is the better
Average Verb Phrases (.4213) - number of verb phrases
(https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html?)
LogL, WSJ (r = .3723) - log likelihood of an article based on a source; more likely an article is the better
Number of words (r = -.3713) - longer articles considered less readable

https://aclanthology.org/D08-1020.pdf

  1. general word familiarity (use a commonly used corpus); how probably an article is based on it's vocabulary article likelihood
  2. technical word familiarity (scrape r4ds and some other good sources)
  3. document length
  4. verb phrases
@thisisnic thisisnic added the enhancement New feature or request label Jul 20, 2021
@thisisnic thisisnic changed the title More complex vignette metrics Alternative readbility metrics Aug 7, 2021
@thisisnic thisisnic added this to the 0.1.1 milestone Aug 7, 2021
@thisisnic thisisnic modified the milestones: 0.1, 0.2 Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant