Skip to content

Add template pre-unlink.sh for Bioconductor data packages #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 14, 2017

Conversation

jdblischak
Copy link
Member

xref: #234 #235

This PR adds a template pre-unlink.sh file for Bioconductor data packages that runs prior to removing the package from the conda environment. Without this file, the R package remains installed and accessible in $PREFIX/lib/R/library even after it has been removed.

It also makes two minor edits to the post-link.sh file:

  1. Removes indentation from the line that defines MD5
  2. The subdirectory within $PREFIX/share that was created to download the tarball is deleted instead of only deleting the tarball

@@ -813,8 +812,11 @@ def write_recipe(package, recipe_dir, config, force=False, bioc_version=None,

# Install and clean up
R CMD INSTALL --library=$PREFIX/lib/R/library --build $TARBALL
rm $TARBALL""")
rm -r $STAGING""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always be sure $STAGING has no other content than $TARBALL? If not, I'd rather use

rm $TARBALL
rmdir $STAGING

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$STAGING was created at the beginning of this script specifically to store the tarball. To my knowledge other conda processes will not use $PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM. But even if they did, these files would no longer be needed if the package is removed.

If we do want to allow the possibility of other files to exist there after the package is removed, we'd want to do:

rm $TARBALL
rmdir --ignore-fail-on-non-empty $STAGING

Otherwise the script would fail due to the directory not being empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$STAGING was created at the beginning of this script specifically to store the tarball. To my knowledge other conda processes will not use $PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM. But even if they did, these files would no longer be needed if the package is removed.

I agree, but who knows what people might do ¯\_(ツ)_/¯. I just like being explicit when deleting stuff -- but I'm fine either way as that's only being pedantic 😉.

Otherwise the script would fail due to the directory not being empty.

No, the post-link.sh is not executed with the -e switch, hence it will not exit on error. Otherwise we'd also have to take care of the previous commands (mkdir, wget, rm) which can also fail any time. IMHO, we should even explicitly not use --ignore-fail-on-non-empty to not suppress error output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but who knows what people might do ¯_(ツ)_/¯. I just like being explicit when deleting stuff -- but I'm fine either way as that's only being pedantic

I actually like the 2-step process. It was what I was going to do originally, but feared it was too verbose. I'm happy to switch to using rmdir.

No, the post-link.sh is not executed with the -e switch, hence it will not exit on error.

Good point! I always forget that Bash doesn't stop on error by default.

with open(os.path.join(recipe_dir, 'post-link.sh'), 'w') as fout:
fout.write(dedent(post_link_template))
pre_unlink_template = "rm -r $PREFIX/lib/R/library/{0}\n".format(package)
Copy link
Member

@mbargull mbargull Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a removal of that directory suffice or does

R CMD INSTALL --library=$PREFIX/lib/R/library --build $TARBALL

add or modify any other files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test. The only files added by R CMD INSTALL are in lib/R/library. There are other files left after removing the package, but they are all created by conda.

$ conda create -y -n test bioconductor-genomeinfodbdata
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/conda-meta/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.json
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
/opt/conda/envs/test/bin/.bioconductor-genomeinfodbdata-post-link.sh
$ conda remove -y -n test bioconductor-genomeinfodbdata
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz

The files in pkgs/ are also left behind when removing a normal R recipe:

$ conda install -y -n test r-registry
$ find /opt/conda/ -iname registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry/R/registry
/opt/conda/envs/test/lib/R/library/registry
/opt/conda/envs/test/lib/R/library/registry/R/registry
$ conda remove -y -n test r-registry
$ find /opt/conda/ -iname registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry
/opt/conda/pkgs/r-registry-0.3-r3.4.1_0/lib/R/library/registry/R/registry

Copy link
Member

@mbargull mbargull Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pkgs folder is just Conda's package cache, no worries about that. But
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0 [EDIT: duh...]

/opt/conda/envs/test/lib/R/library/GenomeInfoDbData
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdb
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/help/GenomeInfoDbData.rdx
/opt/conda/envs/test/lib/R/library/GenomeInfoDbData/scripts/updateGenomeInfoDbData.R
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz

are not created by conda itself but likely by R.
Does R have an uninstall command, i.e., some direct pendant to R CMD INSTALL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does R have an uninstall command, i.e., some direct pendant to R CMD INSTALL?

Yes, it does. We could switch to it, but it has the same effect in this case as the current rm -r (i.e. remove the directory in lib/R/library):

$ conda create -y -n test bioconductor-genomeinfodbdata
$ source activate test
$ R CMD REMOVE --library=/opt/conda/envs/test/lib/R/library/ GenomeInfoDbData
$ find /opt/conda/ -iname *genomeinfodbdata*
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0
/opt/conda/pkgs/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0/bin/.bioconductor-genomeinfodbdata-post-link.sh
/opt/conda/envs/test/conda-meta/bioconductor-genomeinfodbdata-0.99.1-r3.4.1_0.json
/opt/conda/envs/test/share/bioconductor-genomeinfodbdata-0.99.1-0
/opt/conda/envs/test/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
/opt/conda/envs/test/bin/.bioconductor-genomeinfodbdata-post-link.sh

The directory in share is the directory $STAGING created by post-link.sh (which as we discussed above, this PR also proposes removing) and the files in bin/ are added by conda.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for R CMD REMOVE, but only because I don't know what R's INSTALL/REMOVE can actually do..
So regardless if we use one or the other removal command, for the GenomeInfoDbData example only $PREFIX/bin/GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz will remain when this PR is adopted. This looks like some compilation file from R (who knows why the heck this ends up in bin...). To confirm this, diffed the environment pre-install, post-install and post-removal:

$ diff -rqx conda-meta init/ installed/
Only in installed/bin: .bioconductor-genomeinfodbdata-post-link.sh
Only in installed/bin: GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
Only in installed/lib/R/library: GenomeInfoDbData
Only in installed/share: bioconductor-genomeinfodbdata-0.99.1-0
$ diff -rqx conda-meta installed/ removed/
Only in installed/bin: .bioconductor-genomeinfodbdata-post-link.sh
$ diff -rqx conda-meta init/ removed/
Only in removed/bin: GenomeInfoDbData_0.99.1_R_x86_64-pc-linux-gnu.tar.gz
Only in removed/lib/R/library: GenomeInfoDbData
Only in removed/share: bioconductor-genomeinfodbdata-0.99.1-0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, IIUC, R CMD INSTALL --build installs the package first and then creates the tarball in bin. Since I don't believe we need/want that tarball anyway, shouldn't the command just be

R CMD INSTALL --library=$PREFIX/lib/R/library $TARBALL

without the --build?

Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know next to nothing about R, so someone more knowledgeable than me should be review this, too 😉

@mbargull mbargull requested a review from daler December 6, 2017 18:08
@jdblischak
Copy link
Member Author

Thanks for the review @mbargull!

@jdblischak
Copy link
Member Author

@mbargull I've updated this PR based on your feedback:

  1. Do not use the --build flag from the install command
  2. Remove tarball and staging directory in two steps
  3. Use R CMD REMOVE to remove the package

Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM!

@mbargull
Copy link
Member

@bioconda/r: Is this good to go or do some of you have any objections?

@jdblischak
Copy link
Member Author

Note that I wasn't able to test my most recent commit because #237 was merged into my PR: #237 (comment)

@jdblischak
Copy link
Member Author

I cleaned up the commit history using git cherry-pick (I don't love the automatic merge commits that get added, but I see why that would be useful in some cases).

I performed the following test to confirm that Bioconductor data packages are installed in the correct directory (even if a user directory exists) and are properly removed with conda remove.

mkdir -p ~/R/x86_64-pc-linux-gnu-library/3.4
bioconda-utils bioconductor-skeleton recipes/ config.yml bovineprobe
conda build --R 3.4.1 recipes/bioconductor-bovineprobe/
conda create -y -n test --use-local bioconductor-bovineprobe
source activate test
Rscript -e "library(bovineprobe)"
# Installed in the correct location
ls ~/R/x86_64-pc-linux-gnu-library/3.4
conda remove -y bioconductor-bovineprobe
# There is little trace of the package after removal
find /opt/conda/ -name "bovineprobe"

After removal, here are the files still around:

# These are because I built the package locally
/opt/conda/conda-bld/src_cache/bovineprobe_2.18.0.tar.gz
/opt/conda/conda-bld/linux-64/bioconductor-bovineprobe-2.18.0-r3.4.1_0.tar.bz2
/opt/conda/lib/python3.5/site-packages/bioconda_utils-0.10.0-py3.5.egg/bioconda_utils/cached_bioconductor_tarballs/bovineprobe_2.18.0.tar.gz
# These are files that conda leaves behind.
/opt/conda/pkgs/bioconductor-bovineprobe-2.18.0-r3.4.1_0.tar.bz2
/opt/conda/pkgs/bioconductor-bovineprobe-2.18.0-r3.4.1_0
/opt/conda/pkgs/bioconductor-bovineprobe-2.18.0-r3.4.1_0/bin/.bioconductor-bovineprobe-post-link.sh
/opt/conda/pkgs/bioconductor-bovineprobe-2.18.0-r3.4.1_0/bin/.bioconductor-bovineprobe-pre-unlink.sh

@mbargull
Copy link
Member

I cleaned up the commit history using git cherry-pick (I don't love the automatic merge commits

Clicking those GitHub "Update branch" buttons is just like taking the car instead of a bike: Quite convenient but dirty.. thanks for the cleanup 😉

After removal, here are the files still around:

Great, all of those are expected/intended to remain. So, this PR should be ready to be merged!

@jdblischak
Copy link
Member Author

@daler Please review and merge

@daler daler merged commit 09ed5af into bioconda:master Dec 14, 2017
@daler
Copy link
Member

daler commented Dec 14, 2017

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants