set up for production on large memory machines like NERSC cori #1

turbomam · 2021-10-04T12:20:24Z

Include reporting of emp500 fields

am I backing myself into a corner wrt not running on computers with more modest RAM (32 GB?) in the future?

be very clear about differences from existing Perl approach. creates two TSVs: a long EAV of attributes and a wide table of non-attribute data. Casting the long EAV to wide is chunked but still needs lots of RAM. Then merging of the two wide tables is tricky. Currently doing in SQLite, but not happy with duplicated index column and soem other annoyances. Try in Python?!

turbomam · 2021-10-04T12:26:15Z

run this to start:

export PROJDIR=/global/cfs/cdirs/m3513/endurable/biosample/mam

make sure others have permission to access that directory, or move to XXX?
don't forget to use a terminal session manager like screen
- make a record of what node you were logged into before leaving the screen session

turbomam · 2021-10-04T12:32:00Z

earlier this week, unpacking biosample_set.xml.gz download gave incomplete end of file errors

seems OK today

turbomam · 2021-10-04T12:39:37Z

be mindful of filesystem quotas

https://docs.nersc.gov/filesystems/quotas/

cori12:/global/cfs/cdirs/m3513/endurable/biosample/mam/biosample-basex> myquota --path=/global/cfs/cdirs/m3513
FILESYSTEM                SPACE_USED   SPACE_QUOTA   SPACE_PCT   INODE_USED   INODE_QUOTA   INODE_PCT
/global/cfs/cdirs/m3513   1.74TiB      20.00TiB      8.7%        1.86M        20.00M        9.3%

turbomam · 2021-10-04T12:43:32Z

Document installation of BaseX application

vi $PROJDIR/biosample-basex/basex/bin/basex

to set the max Java memory

BASEX_JVM="-Xmx96g $BASEX_JVM"

Are there any Cori policies for max RAM per process or user?

turbomam · 2021-10-04T12:52:48Z

export BASEXCMD=$PROJDIR/biosample-basex/basex/bin/basex

$BASEXCMD -c 'CREATE DB biosample_set target/biosample_set.xml'

Recently, this database creation has taken ~ 2 hours. basex/biosample_set/tbl.basex will end up looking something like this:

MAM@MAM-M74 ~ % ls -lSrh basex/data/biosample_set 
total 105509144
-rw-r--r--  1 MAM  staff     9B Sep 23 09:10 tbli.basex
-rw-r--r--  1 MAM  staff    35K Sep 23 09:46 inf.basex
-rw-r--r--  1 MAM  staff   359M Sep 23 09:32 txtr.basex
-rw-r--r--  1 MAM  staff   405M Sep 23 09:46 atvr.basex
-rw-r--r--  1 MAM  staff   1.0G Sep 23 09:32 txtl.basex
-rw-r--r--  1 MAM  staff   1.9G Sep 23 09:46 atvl.basex
-rw-r--r--  1 MAM  staff   5.9G Sep 23 09:10 txt.basex
-rw-r--r--  1 MAM  staff    10G Sep 23 09:10 atv.basex
-rw-r--r--  1 MAM  staff    31G Sep 23 09:11 tbl.basex

Issue #1 cori

turbomam added a commit that referenced this issue Oct 15, 2021

Merge pull request #6 from turbomam/issue-1-cori

7e19c58

Issue #1 cori

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set up for production on large memory machines like NERSC cori #1

set up for production on large memory machines like NERSC cori #1

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

set up for production on large memory machines like NERSC cori #1

set up for production on large memory machines like NERSC cori #1

Comments

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 • edited Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading

turbomam commented Oct 4, 2021 •

edited

Loading