Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to override Spark memory allocation from docker run command #4

Open
davidonlaptop opened this issue Oct 20, 2015 · 5 comments

Comments

@davidonlaptop
Copy link
Member

Memory allocation is hardcoded in adam-submit launch script

If there is not enough memory allocated to Adam / Spark, Adam may crash. By default, only 512MB is allocated to java processes. Here's a workaround to fix this issue

Download a larger BAM file and start a bash shell in an ADAM container:

mkdir -p ~/data/1kg/samples/hg00096/
wget -nH --cut-dirs=99 \
     -P ~/data/1kg/samples/hg00096/  \
     ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam

Then, start an ADAM container:

docker run --rm -ti -v $HOME/data/1kg/samples/hg00096:/data gelog/adam bash
root@42c257dcfbcc:/# 

Once inside the container, run ADAM with 1.5GB of RAM (the SPARK_DRIVER_MEMORY and SPARK_EXECUTOR_MEMORY variables):

root@42c257dcfbcc:/# SPARK_DRIVER_MEMORY=1500m SPARK_EXECUTOR_MEMORY=1500m \
    adam-submit transform \
    /data/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam \
    /data/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.adam
root@42c257dcfbcc:/#

[UPDATE: June 2nd 2015]: The adam-submit script in Adam 0.16.0 now allocates 4 GB by default see here. If you need more than 4GB, then we need to find a way to override this parameter when running the container. The adam-submit script accepts the --conf spark.executor.memory="1g" option, but it has no effect.

Please contact us if you find a more elegant way to solve this issue.

@akhmees
Copy link

akhmees commented Sep 28, 2016

I am not sure if this is correct. I am started adam-submit on docker without specifying any parameters and it only allocated 256MB of memory.
#adam-submit -- flagstat NA12878.adam 16/09/28 06:15:17 INFO MemoryStore: MemoryStore started with capacity **265.4 MB** 16/09/28 06:15:19 INFO MemoryStore: ensureFreeSpace(171744) called with curMem=0, maxMem=**278302556**
I can change the memory by adding spark args but surprisingly i don't see any performance gain. ADAMN=2.10-0.18.2, SPARK=1.4.1, Docker=1.12.0 and i am using this container gelog/adam https://hub.docker.com/r/gelog/adam/

@davidonlaptop
Copy link
Member Author

Hi @akhmess, adding more more memory is only going to help you if this is the bottleneck for your use case.

@akhmees
Copy link

akhmees commented Sep 28, 2016

The thing is so far samtools is outperforming ADAM on flagstat command.
Although i am running the test on ADAM file.
On Wed, Sep 28, 2016 at 11:12 PM David Lauzon [email protected]
wrote:

Hi @akhmess https://github.com/akhmess, adding more more memory is only
going to help you if this is the bottleneck for your use case.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOLtvz6Q-MOmwiua-HIjUcJUWUhL_mMPks5quso8gaJpZM4GRzYt
.

@aapril
Copy link
Member

aapril commented Sep 28, 2016

Send this comment to Frank at Berkeley

Prof April

On Sep 28, 2016, at 16:52, akhmess [email protected] wrote:

The thing is so far samtools is outperforming ADAM on flagstat command.
Although i am running the test on ADAM file.
On Wed, Sep 28, 2016 at 11:12 PM David Lauzon [email protected]
wrote:

Hi @akhmess https://github.com/akhmess, adding more more memory is only
going to help you if this is the bottleneck for your use case.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOLtvz6Q-MOmwiua-HIjUcJUWUhL_mMPks5quso8gaJpZM4GRzYt
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@davidonlaptop
Copy link
Member Author

Good point @aapril. Yes since it is a ADAM related issue, you should report it to the ADAM community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants