-
Notifications
You must be signed in to change notification settings - Fork 134
Add template pre-unlink.sh for Bioconductor data packages #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -813,8 +812,11 @@ def write_recipe(package, recipe_dir, config, force=False, bioc_version=None, | |||
|
|||
# Install and clean up | |||
R CMD INSTALL --library=$PREFIX/lib/R/library --build $TARBALL | |||
rm $TARBALL""") | |||
rm -r $STAGING""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we always be sure $STAGING
has no other content than $TARBALL
? If not, I'd rather use
rm $TARBALL
rmdir $STAGING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$STAGING
was created at the beginning of this script specifically to store the tarball. To my knowledge other conda processes will not use $PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM
. But even if they did, these files would no longer be needed if the package is removed.
If we do want to allow the possibility of other files to exist there after the package is removed, we'd want to do:
rm $TARBALL
rmdir --ignore-fail-on-non-empty $STAGING
Otherwise the script would fail due to the directory not being empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$STAGING
was created at the beginning of this script specifically to store the tarball. To my knowledge other conda processes will not use$PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM
. But even if they did, these files would no longer be needed if the package is removed.
I agree, but who knows what people might do ¯\_(ツ)_/¯. I just like being explicit when deleting stuff -- but I'm fine either way as that's only being pedantic 😉.
Otherwise the script would fail due to the directory not being empty.
No, the post-link.sh
is not executed with the -e
switch, hence it will not exit on error. Otherwise we'd also have to take care of the previous commands (mkdir
, wget
, rm
) which can also fail any time. IMHO, we should even explicitly not use --ignore-fail-on-non-empty
to not suppress error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but who knows what people might do ¯_(ツ)_/¯. I just like being explicit when deleting stuff -- but I'm fine either way as that's only being pedantic
I actually like the 2-step process. It was what I was going to do originally, but feared it was too verbose. I'm happy to switch to using rmdir
.
No, the post-link.sh is not executed with the -e switch, hence it will not exit on error.
Good point! I always forget that Bash doesn't stop on error by default.
with open(os.path.join(recipe_dir, 'post-link.sh'), 'w') as fout: | ||
fout.write(dedent(post_link_template)) | ||
pre_unlink_template = "rm -r $PREFIX/lib/R/library/{0}\n".format(package) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a removal of that directory suffice or does
R CMD INSTALL --library=$PREFIX/lib/R/library --build $TARBALL
add or modify any other files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test. The only files added by R CMD INSTALL
are in lib/R/library
. There are other files left after removing the package, but they are all created by conda.
$ conda create -y -n test bioconductor-genomeinfodbdata
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/conda-meta/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.json
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
/opt/conda/envs/test/bin/.bioconductor-genomeinfodbdata-post-link.sh
$ conda remove -y -n test bioconductor-genomeinfodbdata
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
The files in pkgs/
are also left behind when removing a normal R recipe:
$ conda install -y -n test r-registry
$ find /opt/conda/ -iname registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry/R/registry
/opt/conda/envs/test/lib/R/library/registry
/opt/conda/envs/test/lib/R/library/registry/R/registry
$ conda remove -y -n test r-registry
$ find /opt/conda/ -iname registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry/R/registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pkgs
folder is just Conda's package cache, no worries about that. But
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0 [EDIT: duh...]
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
are not created by conda
itself but likely by R
.
Does R
have an uninstall command, i.e., some direct pendant to R CMD INSTALL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does R have an uninstall command, i.e., some direct pendant to R CMD INSTALL?
Yes, it does. We could switch to it, but it has the same effect in this case as the current rm -r
(i.e. remove the directory in lib/R/library
):
$ conda create -y -n test bioconductor-genomeinfodbdata
$ source activate test
$ R CMD REMOVE --library=/opt/conda/envs/test/lib/R/library/ GenomeInfoDbData
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/conda-meta/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.json
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
/opt/conda/envs/test/bin/.bioconductor-genomeinfodbdata-post-link.sh
The directory in share
is the directory $STAGING
created by post-link.sh
(which as we discussed above, this PR also proposes removing) and the files in bin/
are added by conda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for R CMD REMOVE
, but only because I don't know what R
's INSTALL
/REMOVE
can actually do..
So regardless if we use one or the other removal command, for the GenomeInfoDbData
example only $PREFIX/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
will remain when this PR is adopted. This looks like some compilation file from R
(who knows why the heck this ends up in bin
...). To confirm this, diff
ed the environment pre-install, post-install and post-removal:
$ diff -rqx conda-meta init/ installed/
Only in installed/bin: .bioconductor-genomeinfodbdata-post-link.sh
Only in installed/bin: GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
Only in installed/lib/R/library: GenomeInfoDbData
Only in installed/share: bioconductor-genomeinfodbdata-0.99.1-0
$ diff -rqx conda-meta installed/ removed/
Only in installed/bin: .bioconductor-genomeinfodbdata-post-link.sh
$ diff -rqx conda-meta init/ removed/
Only in removed/bin: GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
Only in removed/lib/R/library: GenomeInfoDbData
Only in removed/share: bioconductor-genomeinfodbdata-0.99.1-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, IIUC, R CMD INSTALL --build
installs the package first and then creates the tarball in bin
. Since I don't believe we need/want that tarball anyway, shouldn't the command just be
R CMD INSTALL --library=$PREFIX/lib/R/library $TARBALL
without the --build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know next to nothing about R, so someone more knowledgeable than me should be review this, too 😉
Thanks for the review @mbargull! |
@mbargull I've updated this PR based on your feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
@bioconda/r: Is this good to go or do some of you have any objections? |
Note that I wasn't able to test my most recent commit because #237 was merged into my PR: #237 (comment) |
868bbfd
to
8528ac7
Compare
I cleaned up the commit history using I performed the following test to confirm that Bioconductor data packages are installed in the correct directory (even if a user directory exists) and are properly removed with
After removal, here are the files still around:
|
Clicking those GitHub "Update branch" buttons is just like taking the car instead of a bike: Quite convenient but dirty.. thanks for the cleanup 😉
Great, all of those are expected/intended to remain. So, this PR should be ready to be merged! |
@daler Please review and merge |
Thank you! |
xref: #234 #235
This PR adds a template
pre-unlink.sh
file for Bioconductor data packages that runs prior to removing the package from the conda environment. Without this file, the R package remains installed and accessible in$PREFIX/lib/R/library
even after it has been removed.It also makes two minor edits to the
post-link.sh
file:MD5
$PREFIX/share
that was created to download the tarball is deleted instead of only deleting the tarball