Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group 1 - R_Prelim_Eda_Helper #25

Open
7 of 30 tasks
morrismanfung opened this issue Jan 31, 2023 · 4 comments
Open
7 of 30 tasks

Group 1 - R_Prelim_Eda_Helper #25

morrismanfung opened this issue Jan 31, 2023 · 4 comments

Comments

@morrismanfung
Copy link

morrismanfung commented Jan 31, 2023


name: R_Prelim_Eda_Helper
about: This package provides a streamlined and easy to use solution for basic EDA tasks that would otherwise require significant amount of coding to achieve.


Submitting Author Name: Morris Chan
Submitting Author Github Handle: @morrismanfung
Other Package Authors Github handles: (comma separated, delete if none) @MNBhat, @Lorraine97, @austin-shih
Repository: https://github.com/UBC-MDS/R_Prelim_Eda_Helper
Version submitted:
Submission type: Standard
Editor: Morris Chan
Reviewers: Zilong Yi, Peng Zhang, Mohammad Reza Nabizadeh Shahrbabak

Archive: TBD
Version accepted: TBD
Language: en

  • Paste the full DESCRIPTION file inside a code block below:
Package: RPrelimEdaHelper
Title: This package is a preliminary exploratory data analysis tool to make useful feature comparison plots and provide relevant information to simplify an otherwise tedious EDA step of any data science project
Version: 0.0.0.9000
Authors@R: 
    person("Austin", "Shih", , "[email protected]", role = c("aut", "cre"))
    person("Mehwish", "Nabi", , "[email protected]", role = c("aut", "cre"))
    person("Morris", "Chan", , "[email protected]", role = c("aut", "cre"))
    person("Xinru ", "Lu", , "[email protected]", role = c("aut", "cre"))
Description: This package is a preliminary exploratory data analysis tool to make useful feature 
    comparison plots and provide relevant information to simplify an otherwise tedious EDA step of any data science project.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests: 
    car,
    testthat (>= 3.0.0),
    tidyverse
Config/testthat/edition: 3
Imports: 
    cowplot,
    dplyr,
    ggplot2,
    mice,
    palmerpenguins,
    stats,
    tibble

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
    • statistical
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package performs statistical tests along side with the visualizations.

  • Who is the target audience and what are scientific applications of this package?

Researchers or analysts who have the need to constantly do explanatory analysis and statistical testing. The helper functions are designed to speed up the frequently used pipelines.

There are existing packages for visualizations and statistical tests. However, they usually require more complicated syntaxes. Our package combines graphic visualizations with preliminary statistical test results aiming to enable users to quickly get a sense of how the data look like.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@mrnabiz
Copy link

mrnabiz commented Feb 7, 2023

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 Hour

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The package is very interesting and I enjoyed going through it. The package install was successful from github.
The package documentation was great and I was able to use each of the functions after installing your package on my RStudio IDE. Using the badges was great which showed a 71% code coverage. For cat_dist_heatmap function if you could add a bit more explanation like what you did in the article section, it'd be great. Overall, Solid work. Congrats!

@zchen156
Copy link

zchen156 commented Feb 7, 2023

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 hr

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • The titles for each function are not formatted well when opening the generated vignette html.
  • The descriptions and the titles for all the functions are the same in vignette html.
  • It will be nice if the name of the library is also displayed in the README.md.
  • Maybe having completed examples for the code of all the functions in the README.md. When I directly tested the codes, I got error messages like this (taking num_dist_scatter( ) as the example)
Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "function"
  • After testing the code from the usage example for cat_dist_heatmap( ) and num_dist_scatter ( ), it is showing the below warning messages
Warning messages:
1: Use of `data[[cat_1]]` is discouraged.
ℹ Use `.data[[cat_1]]` instead. 
2: Use of `data[[cat_2]]` is discouraged.
ℹ Use `.data[[cat_2]]` instead. 
3: Use of `data[[cat_1]]` is discouraged.
ℹ Use `.data[[cat_1]]` instead. 
4: Use of `data[[cat_2]]` is discouraged.
ℹ Use `.data[[cat_2]]` instead. 
Warning messages:
1: In cor.test.default(df1[[num1]], df1[[num2]], method = "spearman",  :
  Cannot compute exact p-value with ties
2: In cor.test.default(df1[[num1]], df1[[num2]], method = "spearman",  :
  Cannot compute exact p-value with ties
  • I noticed that in the latest version of your package, the test coverage for num_dist_simmary( ) is R/num_dist_summary.R: 25.00% which is relatively low compared to the other three functions. It could be a future improvement for the package.

@pengzh313
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 hour

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • The idea for developing this package is great since we all have experienced the inefficiency of performing EDA from scratch. I hope the R package becomes a handy tool for data scientists.
  • I have forked the package repo, loaded the dev version of package, and it can run successfully via my RStudio. After calling check(), there are no errors or warnings.
  • I would suggest to make the package name consistent all through the repo. For example, the name of this package should be RPrelimEdaHelper, but you used another name R_prelim_eda_helper before the Usage section of the ReadMe file, and prelim_eda_helper within the section of License. This may cause confusion for users when they call library(package_name) to start using the package.
  • Same as my suggestion for your Python package prelim_eda_helper, it would be more effective if you can add some example plots on the Usage section.
  • Within the DESCRIPTION file, you may consider to update the "Title" to more concise way as currently the "Title" and "Description" are the same.

@ZilongYi
Copy link

ZilongYi commented Feb 9, 2023

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing:

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • In general, I really love the idea of creating such helper functions. Well done. I noticed that there are no comments for test functions, which is fine for user, but not for developer or future self. It would be much better if those comments are added.
  • Also, I noticed that the code coverage is 71%, which personally think it is a bit low for a package with a few functions.
  • In folder R, there is a subfolder checkpoint, which is just something with jupyter notebook, not essential to package. This folder should be deleted.
  • One suggestion for future improvement would be considering to combine function/test files into one files.
  • It would be better to consider a shorter name for each functions. Less time typing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants