Need help. Tasks spawned by Dagger.@spawn (or Dagger.spawn() ) do not see the get the data. #563
Replies: 2 comments
-
Hey there, sorry about the long delay! The code you have here is incorrectly writing to Dagger.spawn() do
dfr_v = [DataFrame() for i in 1:nF]
rfn = PrqDir * "ratings_" * string(i, pad = 2) * ".parquet"
println( rfn )
dfr_v[i] = DataFrame(read_parquet( rfn ))
#wait( Dagger.spawn() do
ra[:,i] , ca[:,i] = dg_FindRatingsWorker( i, ng, kg, dfm, dfr_v[i])
end The line One way to solve this is to do the write to t = Dagger.spawn() do
dfr_v = [DataFrame() for i in 1:nF]
rfn = PrqDir * "ratings_" * string(i, pad = 2) * ".parquet"
println( rfn )
dfr_v[i] = DataFrame(read_parquet( rfn ))
#wait( Dagger.spawn() do
return dg_FindRatingsWorker( i, ng, kg, dfm, dfr_v[i])
end
ra[:,i] , ca[:,i] = fetch(t) Of course, as this is written, there is no parallelism here. If you want parallelism, you'll have to launch multiple |
Beta Was this translation helpful? Give feedback.
-
Thank you. I will examine your response and go from there.
…On Fri, Aug 23, 2024 at 10:06 AM Julian Samaroo ***@***.***> wrote:
Hey there, sorry about the long delay! The code you have here is
incorrectly writing to ra and ca; in this code:
Dagger.spawn() do
dfr_v = [DataFrame() for i in 1:nF]
rfn = PrqDir * "ratings_" * string(i, pad = 2) * ".parquet"
println( rfn )
dfr_v[i] = DataFrame(read_parquet( rfn ))
#wait( Dagger.spawn() do
ra[:,i] , ca[:,i] = dg_FindRatingsWorker( i, ng, kg, dfm, dfr_v[i])
end
The line ra[:,i] , ca[:,i] = dg_FindRatingsWorker( i, ng, kg, dfm,
dfr_v[i]) is invalid because the function passed to Dagger.spawn may run
on Distributed workers, in which case they won't be able to access the
right ra and ca - instead, those will be implicitly copied by
Distributed, and your changes will end up being lost.
One way to solve this is to do the write to ra/ca outside of Dagger:
t = Dagger.spawn() do
dfr_v = [DataFrame() for i in 1:nF]
rfn = PrqDir * "ratings_" * string(i, pad = 2) * ".parquet"
println( rfn )
dfr_v[i] = DataFrame(read_parquet( rfn ))
#wait( Dagger.spawn() do
return dg_FindRatingsWorker( i, ng, kg, dfm, dfr_v[i])
end
ra[:,i] , ca[:,i] = fetch(t)
Of course, as this is written, there is no parallelism here. If you want
parallelism, you'll have to launch multiple Dagger.spawn calls, and then
fetch them all once they've been launched.
—
Reply to this email directly, view it on GitHub
<#563 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGK34LCZPZ5EK75G65IQRTZS46WFAVCNFSM6AAAAABMDMF6N6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANBTGEYTINA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
I am using the MovieLens data to experiment with Dagger. I wrote an app using Base.Threads in which a FindRatingsMaster() function partitions the data into N shards (as DataFrames) and gives a shard to 10 FindRatingsWorfkers() functions, where the processing is done. I had no problem getting this done with Base.Threads on a single desktop, with 26 cores. Then I tried doing the same using Julia's Distributed, but could not make it work and gave up. Then I saw that Dagger has been suggested for things like this, so I went through Dagger's documentation, saw and ran the example code, and then starting coding. Unfortunately I am having similar issues with dagger as I had them with Distributed. Any help and suggestions would be greatly appreciatted.
Beta Was this translation helpful? Give feedback.
All reactions