-
Notifications
You must be signed in to change notification settings - Fork 0
samulic/tm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Text Mining and Search project ### FOLDER'S STRUCTURE ### Project_root --Plots # Folder for wordclouds and models' performance plots --Data # Data folder ----preprocessed.RData # Output df from file '2_preprocessing.R' with unique combination of matrix type and sparsity # to be fed into '3_models.R' ----20news-bydate-test/ # Newsarticles from the test set ----20news-bydate-train/# Newsarticles from the train set ------alt.atheism/ # Topic folder --------doc_id # Single document file referenced by id ------rec.autos/ # Topic folder (one of the 20) ------sci.crypt/ # Topic folder --0_helperFunctions.R # Script with various functions to preprocess and to build different text representation matrixes --helperFunctions.RData # Rdata that contains all the functions above used further on (just to simplify code estetic) --1_exploration.R # Explore different text representations and sparsity threshold through word clouds --2_preprocessing.R # Build dataframe with target/topic variable included, ready to be used for model training --3_models.R # Models bulding with caret ### LIBRARIES ### The following is a list of commands to execute in R to install all the needed packages. install.packages( c("tm", "caret", "dplyr", "StandardizeText", "SnowballC", "wordcloud2", "tidytext", "webshot", "kernlab", "splitstackshape", "e1071", "textclean", "mgsub", "tictoc", "klaR", "promises", "C50", "inum", "NLP", "stringi", "RNewsflow", "doParallel") webshot::install_phantomjs()
About
Text Mining and Search project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published