Skip to content
This repository has been archived by the owner on Aug 16, 2024. It is now read-only.

set up for production on large memory machines like NERSC cori #1

Open
turbomam opened this issue Oct 4, 2021 · 5 comments
Open

set up for production on large memory machines like NERSC cori #1

turbomam opened this issue Oct 4, 2021 · 5 comments

Comments

@turbomam
Copy link
Owner

turbomam commented Oct 4, 2021

Include reporting of emp500 fields

am I backing myself into a corner wrt not running on computers with more modest RAM (32 GB?) in the future?

be very clear about differences from existing Perl approach. creates two TSVs: a long EAV of attributes and a wide table of non-attribute data. Casting the long EAV to wide is chunked but still needs lots of RAM. Then merging of the two wide tables is tricky. Currently doing in SQLite, but not happy with duplicated index column and soem other annoyances. Try in Python?!

@turbomam
Copy link
Owner Author

turbomam commented Oct 4, 2021

run this to start:

export PROJDIR=/global/cfs/cdirs/m3513/endurable/biosample/mam

  • make sure others have permission to access that directory, or move to XXX?
  • don't forget to use a terminal session manager like screen
    • make a record of what node you were logged into before leaving the screen session

@turbomam
Copy link
Owner Author

turbomam commented Oct 4, 2021

earlier this week, unpacking biosample_set.xml.gz download gave incomplete end of file errors

seems OK today

@turbomam
Copy link
Owner Author

turbomam commented Oct 4, 2021

be mindful of filesystem quotas

https://docs.nersc.gov/filesystems/quotas/

cori12:/global/cfs/cdirs/m3513/endurable/biosample/mam/biosample-basex> myquota --path=/global/cfs/cdirs/m3513
FILESYSTEM                SPACE_USED   SPACE_QUOTA   SPACE_PCT   INODE_USED   INODE_QUOTA   INODE_PCT
/global/cfs/cdirs/m3513   1.74TiB      20.00TiB      8.7%        1.86M        20.00M        9.3%    

@turbomam
Copy link
Owner Author

turbomam commented Oct 4, 2021

Document installation of BaseX application


vi $PROJDIR/biosample-basex/basex/bin/basex

to set the max Java memory

BASEX_JVM="-Xmx96g $BASEX_JVM"

Are there any Cori policies for max RAM per process or user?

@turbomam
Copy link
Owner Author

turbomam commented Oct 4, 2021

export BASEXCMD=$PROJDIR/biosample-basex/basex/bin/basex

$BASEXCMD -c 'CREATE DB biosample_set target/biosample_set.xml'

Recently, this database creation has taken ~ 2 hours. basex/biosample_set/tbl.basex will end up looking something like this:

MAM@MAM-M74 ~ % ls -lSrh basex/data/biosample_set 
total 105509144
-rw-r--r--  1 MAM  staff     9B Sep 23 09:10 tbli.basex
-rw-r--r--  1 MAM  staff    35K Sep 23 09:46 inf.basex
-rw-r--r--  1 MAM  staff   359M Sep 23 09:32 txtr.basex
-rw-r--r--  1 MAM  staff   405M Sep 23 09:46 atvr.basex
-rw-r--r--  1 MAM  staff   1.0G Sep 23 09:32 txtl.basex
-rw-r--r--  1 MAM  staff   1.9G Sep 23 09:46 atvl.basex
-rw-r--r--  1 MAM  staff   5.9G Sep 23 09:10 txt.basex
-rw-r--r--  1 MAM  staff    10G Sep 23 09:10 atv.basex
-rw-r--r--  1 MAM  staff    31G Sep 23 09:11 tbl.basex

turbomam added a commit that referenced this issue Oct 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant