Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group 11 - simpute R #41

Open
11 of 17 tasks
kenuiuc opened this issue Feb 3, 2023 · 3 comments
Open
11 of 17 tasks

Group 11 - simpute R #41

kenuiuc opened this issue Feb 3, 2023 · 3 comments

Comments

@kenuiuc
Copy link

kenuiuc commented Feb 3, 2023


name: simpute
about: A simple data imputation tool


Submitting Author Name: Ken Wang
Submitting Author Github Handle: @kenuiuc
Other Package Authors Github handles: @LisaSeq, @renee-kwon, @Althrun-sun
Repository: https://github.com/UBC-MDS/simpute-r
Version submitted: TBD
Submission type: Standard
Editor: TBD
Reviewers: TBD

Archive: TBD
Version accepted: TBD
Language: en

  • Paste the full DESCRIPTION file inside a code block below:
Package: simpute
Title: A data imputation tool
Version: 0.0.0.9000
Authors@R:
    c(person("Ken", "Wang", , "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "...")),
           person("Fujie", "Sun", , "[email protected]", role = c("aut"),
           comment = c(ORCID = "...")),
           person("Renee", "Kwon", , "[email protected]", role = c("aut"),
           comment = c(ORCID = "...")),
           person("Lisa", "Sequeria", , "[email protected]", role = c("aut"),
           comment = c(ORCID = "...")))
Description: A simple data imputation tool
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
Imports:
    tidyr,
    dplyr,
    lubridate
Suggests:
    testthat (>= 3.0.0)
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data munging
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The simpute package does data imputation, which manipulate input datasets.

  • Who is the target audience and what are scientific applications of this package?

Data analysts or data engineers who have basic R language skills, but not very sophisticated.

MICE does the similar data imputation job, but it's not trivial to learn and use. The goal of our simpute is to provide a really easy to use solution, to the R language beginners.

NA

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

NA

  • Explain reasons for any pkgcheck items which your package is unable to pass.

NA

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@hcwang24
Copy link

hcwang24 commented Feb 8, 2023

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing:

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • The Boolean imputer isn't working.
  • For example usage, can you add the usage output as well (commented out with #).
  • Similarly, the test functions did not pass for the boolean imputer (output copied below).
  • Please consider adding badges after you pass the code coverage test, currently it has not passed in the Github actions.
  • Please add the link to your documentation in the readme file.
  • Looks like equal contribution from the team. Good job! I think after the function fix, it will work very soon.
✖ | 2       2 | bol_imputer                                                                                
───────────────────────────────────────────────────────────────────────────────────────────────────────────
Error (test-bol_imputer.R:5:3): Function fills an empty column
<dplyr:::mutate_error/rlang_error/error/condition>
Error in `mutate(.tbl, !!!funs)`: i In argument: `c1 = (function (x, y) ...`.
Caused by error:
! Can't convert `y` <character> to match type of `x` <logical>.

Error (test-bol_imputer.R:20:3): Function only changes column indicated
<dplyr:::mutate_error/rlang_error/error/condition>
Error in `mutate(.tbl, !!!funs)`: i In argument: `c1 = (function (x, y) ...`.
Caused by error:
! Can't convert `y` <character> to match type of `x` <logical>.

@spencergerlach
Copy link

spencergerlach commented Feb 8, 2023

Package Review

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1 hour

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • I was able to get the boolean imputer to work, but note that the column name given in the example of the function is not spelled correctly, should be spelled "OnTheMarket". I had to manually make this change in the function call to make it work.
  • In response to HanChen's comment about the tests: I was able to get them to run locally (intel mac). I cloned the repo, opened the project, and ran check() and it worked with 0 errors, 0 warnings, and 5 notes
  • I like the improvements to the README, including linking to the documentation. I see the comments on the python package were implemented here. Good work.
  • I see some work is still occurring while I write this review, but I believe I made this same comment on the python side: I think the release version should be v1.0.0? I could be wrong so please look into this.
  • I noticed that my comment on the python side about the reproducibility has been addressed proactively here - Good easy examples provided to reproduce functionality. Nice work.
  • Overall the package seems to function as intended. I like the idea, and I think the documentation and repo are well-organized and intuitive.

@tanmayag97
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 2 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • Got an error in the function call of bol_imputer, but as already mentioned by @spencergerlach it was a typo and is working perfectly fine after that.
  • The package is really well documented i.e. all functions have their docstrings written in a succinct and clear manner. A minor mistake I would like to point out is the documentation contains median added as a default imputation method for date columns, which can be changed to a simple forward fill or backward fill technique. This is because median is almost always a bad way to impute dates.
  • For me all the tests passed successfully and on running check(), I got only 4 notes, which address some very minor issues that the authors might wanna look at.
  • Overall, I tested the package code coverage, which was around ~82%. That is amazing, so I recommend the authors to add this information in their README as badges.
  • The package Idea is solid, the need for a simpler imputation package is there and the work done by the authors in this short duration is commendable. Please keep working towards this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants