-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group 14 - BrokkR #22
Comments
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing:
Review CommentsThe package was easy to understand and install and is very symmetrical to the Python package, overall, the team did a good job with the quality of the code and the documentation. I have a few minor points that the team can consider in the future
|
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 1.5 hours
Review Comments
I would be cautious of including these outputs directly with Overall, well done. I was not aware of a web scraping package within R so this seems like it could have cool applications combining NLP and statistical analysis. I'm interested in seeing how your package evolves from our feedback! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing:
Review Comments
|
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 1 hr
Review Comments
Overall, excellent work! It was a pleasure to review and learn from you! |
name: BrokkR
about: This package allows users to provide a list of URLs for webpages of interest and creates a dataframe with Bag of Words representation that can then later be fed into a machine learning model of their choice. Users also have the option to produce a dataframe with just the raw text of their target webpages to apply the text representation of their choice instead.
Submitting Author Name: Elena Ganacheva, Mike Guron, Daniel Merigo, Mehdi Naji
Submitting Author Github Handle: @elenagan
Other Package Authors Github handles: (comma separated, delete if none) @mikeguron, @DMerigo, @mehdi-naji
Repository: https://github.com/UBC-MDS/BrokkR
Version submitted: 0.2.0
Submission type: Standard
Editor: @flor14
Reviewers: TBD
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package retrieves data from urls provided by the user and extracts the text data from the webpage and then cleans the data and formats it as a dataframe with an option to have bag of words representation.
Who is the target audience and what are scientific applications of this package?
Those who are new to webscraping and want a simple tool to collect text data from the internet for data analysis or machine learning purposes.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
There are some libraries and packages that can facilitate this job, from scraping text from a URL to returning it to a bag of words (BOW). However, to the extent of our knowledge, there is no sufficiently handy and straightforward package for this purpose. This package is a tailored combination of Rvest and CountVectorizer. Rvest widely used to pull different sources of data from HTML and XML pages, and CountVectorizer is a well-known package to convert a collection of texts to a matrix of token counts
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: