-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Make BAM readers distributable #4
base: master
Are you sure you want to change the base?
Conversation
So this adds a reader state that is shared among processes? |
No, it makes a reader stateless so that it does not share its state among processes. Sharing a reading position among processes is error-prone and will result in many bugs. My idea is to make all reader types stateless and create a reader's state only when it is needed (e.g. the moment when a reader starts to read a file). |
Will this mean that if a reader gets copied to several processes, they will start from the same place and read the same things, or since I know pmap distributes parts of computation to available workers, will the readers have their own start and end points on each process? |
I'm going to support both cases in different APIs. Since the BAM reader is stateless, all copied distributed to workers will start reading from the same position. But you often have some intervals you're interested in and will use # Note that the BAM reader does not open the file yet.
reader = BAM.Reader("somefile.bam")
# Distribute jobs to multiple processes.
pmap(intervals) do interval
# `eachoverlap` opens the BAM file and returns a stateful iterator.
for record in eachoverlap(reader, interval)
# do some work...
end
end If you'd like to distribute a job that reads all records from top to bottom, you can use some function that logically splits a BAM file into chunks (say reader = BAM.Reader("somefile.bam")
pmap(split(reader)) do reader_part
# A part of the BAM file will be assigned to a `reader_part` reader.
for record in reader_part
# do some work
end
end I've not yet decided the exact interfaces but I'll make it easier to use in parallel computing. |
That is really cool. In the first example, I'd have a look at @code_warntype, I found in a recent script that in julia 0.6, variables captured in closures got boxed even if the type is known and predictable (it's a current julia bug). |
77fe5ec
to
24556e8
Compare
772db25
to
54319fd
Compare
This makes the BAM reader type distributable for parallel computing.