Group 1 - R_Prelim_Eda_Helper #25

morrismanfung · 2023-01-31T22:40:15Z

name: R_Prelim_Eda_Helper
about: This package provides a streamlined and easy to use solution for basic EDA tasks that would otherwise require significant amount of coding to achieve.

Submitting Author Name: Morris Chan
Submitting Author Github Handle: @morrismanfung
Other Package Authors Github handles: (comma separated, delete if none) @MNBhat, @Lorraine97, @austin-shih
Repository: https://github.com/UBC-MDS/R_Prelim_Eda_Helper
Version submitted:
Submission type: Standard
Editor: Morris Chan
Reviewers: Zilong Yi, Peng Zhang, Mohammad Reza Nabizadeh Shahrbabak

Archive: TBD
Version accepted: TBD
Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: RPrelimEdaHelper
Title: This package is a preliminary exploratory data analysis tool to make useful feature comparison plots and provide relevant information to simplify an otherwise tedious EDA step of any data science project
Version: 0.0.0.9000
Authors@R: 
    person("Austin", "Shih", , "[email protected]", role = c("aut", "cre"))
    person("Mehwish", "Nabi", , "[email protected]", role = c("aut", "cre"))
    person("Morris", "Chan", , "[email protected]", role = c("aut", "cre"))
    person("Xinru ", "Lu", , "[email protected]", role = c("aut", "cre"))
Description: This package is a preliminary exploratory data analysis tool to make useful feature 
    comparison plots and provide relevant information to simplify an otherwise tedious EDA step of any data science project.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests: 
    car,
    testthat (>= 3.0.0),
    tidyverse
Config/testthat/edition: 3
Imports: 
    cowplot,
    dplyr,
    ggplot2,
    mice,
    palmerpenguins,
    stats,
    tibble

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- data validation and testing
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
- statistical
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package performs statistical tests along side with the visualizations.

Who is the target audience and what are scientific applications of this package?

Researchers or analysts who have the need to constantly do explanatory analysis and statistical testing. The helper functions are designed to speed up the frequently used pipelines.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

There are existing packages for visualizations and statistical tests. However, they usually require more complicated syntaxes. Our package combines graphic visualizations with preliminary statistical test results aiming to enable users to quickly get a sense of how the data look like.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

I have read the rOpenSci packaging guide.
I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions, created with roxygen2.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage.

Publication options

Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

The text was updated successfully, but these errors were encountered:

mrnabiz · 2023-02-07T22:24:29Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 Hour

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The package is very interesting and I enjoyed going through it. The package install was successful from github.
The package documentation was great and I was able to use each of the functions after installing your package on my RStudio IDE. Using the badges was great which showed a 71% code coverage. For cat_dist_heatmap function if you could add a bit more explanation like what you did in the article section, it'd be great. Overall, Solid work. Congrats!

zchen156 · 2023-02-07T23:13:42Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 hr

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The titles for each function are not formatted well when opening the generated vignette html.
The descriptions and the titles for all the functions are the same in vignette html.
It will be nice if the name of the library is also displayed in the README.md.
Maybe having completed examples for the code of all the functions in the README.md. When I directly tested the codes, I got error messages like this (taking num_dist_scatter( ) as the example)

Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "function"

After testing the code from the usage example for cat_dist_heatmap( ) and num_dist_scatter ( ), it is showing the below warning messages

Warning messages:
1: Use of `data[[cat_1]]` is discouraged.
ℹ Use `.data[[cat_1]]` instead. 
2: Use of `data[[cat_2]]` is discouraged.
ℹ Use `.data[[cat_2]]` instead. 
3: Use of `data[[cat_1]]` is discouraged.
ℹ Use `.data[[cat_1]]` instead. 
4: Use of `data[[cat_2]]` is discouraged.
ℹ Use `.data[[cat_2]]` instead.

Warning messages:
1: In cor.test.default(df1[[num1]], df1[[num2]], method = "spearman",  :
  Cannot compute exact p-value with ties
2: In cor.test.default(df1[[num1]], df1[[num2]], method = "spearman",  :
  Cannot compute exact p-value with ties

I noticed that in the latest version of your package, the test coverage for num_dist_simmary( ) is R/num_dist_summary.R: 25.00% which is relatively low compared to the other three functions. It could be a future improvement for the package.

pengzh313 · 2023-02-08T22:45:40Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 hour

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The idea for developing this package is great since we all have experienced the inefficiency of performing EDA from scratch. I hope the R package becomes a handy tool for data scientists.
I have forked the package repo, loaded the dev version of package, and it can run successfully via my RStudio. After calling check(), there are no errors or warnings.
I would suggest to make the package name consistent all through the repo. For example, the name of this package should be RPrelimEdaHelper, but you used another name R_prelim_eda_helper before the Usage section of the ReadMe file, and prelim_eda_helper within the section of License. This may cause confusion for users when they call library(package_name) to start using the package.
Same as my suggestion for your Python package prelim_eda_helper, it would be more effective if you can add some example plots on the Usage section.
Within the DESCRIPTION file, you may consider to update the "Title" to more concise way as currently the "Title" and "Description" are the same.

ZilongYi · 2023-02-09T00:52:59Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing:

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

In general, I really love the idea of creating such helper functions. Well done. I noticed that there are no comments for test functions, which is fine for user, but not for developer or future self. It would be much better if those comments are added.
Also, I noticed that the code coverage is 71%, which personally think it is a bit low for a package with a few functions.
In folder R, there is a subfolder checkpoint, which is just something with jupyter notebook, not essential to package. This folder should be deleted.
One suggestion for future improvement would be considering to combine function/test files into one files.
It would be better to consider a shorter name for each functions. Less time typing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group 1 - R_Prelim_Eda_Helper #25

Group 1 - R_Prelim_Eda_Helper #25

morrismanfung commented Jan 31, 2023 •

edited

Loading

mrnabiz commented Feb 7, 2023 •

edited

Loading

zchen156 commented Feb 7, 2023

pengzh313 commented Feb 8, 2023

ZilongYi commented Feb 9, 2023

Group 1 - R_Prelim_Eda_Helper #25

Group 1 - R_Prelim_Eda_Helper #25

Comments

morrismanfung commented Jan 31, 2023 • edited Loading

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

mrnabiz commented Feb 7, 2023 • edited Loading

Package Review

Documentation

Functionality

Review Comments

zchen156 commented Feb 7, 2023

Package Review

Documentation

Functionality

Review Comments

pengzh313 commented Feb 8, 2023

Package Review

Documentation

Functionality

Review Comments

ZilongYi commented Feb 9, 2023

Package Review

Documentation

Functionality

Review Comments

morrismanfung commented Jan 31, 2023 •

edited

Loading

Archive: TBD
Version accepted: TBD
Language: en

mrnabiz commented Feb 7, 2023 •

edited

Loading