Switch to version Dagger 0.18.11 #7

SmalRat · 2024-06-20T03:01:56Z

BEGINRELEASENOTES

Switched to version 0.18.11
Added relevant examples and documentation
Small fixes

ENDRELEASENOTES

… parallel is upper bounded now

m-fila · 2024-06-20T07:12:41Z

Should we try to merge it now or wait for JuliaParallel/Dagger.jl#531?
If we do it now we'll get gantt plott but spoil execution graph from logs, right?

SmalRat · 2024-06-21T05:42:45Z

It contains a modified version (with fixes) of GraphVizSimpleExt, which is used by default, so everything works as of 0.18.11.

m-fila · 2024-06-21T08:13:01Z

Do we need all of the extensions? LuxorExt.jl, GraphVizExt.jl , PlotsExt.jl are exactly the same as in Dagger and should be loaded on importing Dagger anyway

m-fila · 2024-06-21T08:52:44Z

I see, the GraphVizSimpleExt has all the changes from JuliaParallel/Dagger.jl#531 so we can drop it later once the fix gets into dagger

On the other hand I don't think extensions should be used to get patches from upstream project

m-fila · 2024-06-21T12:06:32Z

Running graphs_scheduling/src/main.jl gives the following error once all the processors are done:

ERROR: LoadError: KeyError: key 68 not found
Stacktrace:
  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] _proc_color(ctx::@NamedTuple{proc_to_color::Dict{Dagger.Processor, String}, proc_colors::Vector{RGB{FixedPointNumbers.N0f8}}, proc_color_idx::Base.RefValue{Int64}, proc_to_shape::Dict{Type, String}, proc_shapes::Tuple{String, String, String}, proc_shape_idx::Base.RefValue{Int64}, id_to_proc::Dict{Int64, Dagger.Processor}}, id::Int64)
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:176
  [3] write_edge(io::IOStream, ts_move::TimespanLogging.Timespan, logs::Vector{TimespanLogging.Timespan}, ctx::@NamedTuple{proc_to_color::Dict{Dagger.Processor, String}, proc_colors::Vector{RGB{FixedPointNumbers.N0f8}}, proc_color_idx::Base.RefValue{Int64}, proc_to_shape::Dict{Type, String}, proc_shapes::Tuple{String, String, String}, proc_shape_idx::Base.RefValue{Int64}, id_to_proc::Dict{Int64, Dagger.Processor}}, inputname::String, inputarg::RemoteChannel{Channel{Int64}})
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:235
  [4] write_dag(io::IOStream, logs::Vector{TimespanLogging.Timespan}, t::Nothing)
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:323
  [5] write_dag
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:260 [inlined]
  [6] _show_plan
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:353 [inlined]
  [7] show_logs
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:369 [inlined]
  [8] show_logs(io::IOStream, logs::Vector{TimespanLogging.Timespan}, vizmode::Symbol; options::@Kwargs{})
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:363
  [9] show_logs
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:363 [inlined]
 [10] #17
    @ ~/fwk2/graphs_scheduling/src/main.jl:92 [inlined]
 [11] open(::var"#17#18"{Vector{TimespanLogging.Timespan}}, ::String, ::Vararg{String}; kwargs::@Kwargs{})
    @ Base ./io.jl:396
 [12] open(::Function, ::String, ::String)
    @ Base ./io.jl:393
 [13] main(graphs_map::Dict{String, String})
    @ Main ~/fwk2/graphs_scheduling/src/main.jl:91
 [14] top-level scope
    @ ~/fwk2/graphs_scheduling/src/main.jl:106
in expression starting at /home/mafila/fwk2/graphs_scheduling/src/main.jl:106

Then it reports a series a warnings about workers dying and rescheduling - but it's a known problem

SmalRat · 2024-06-24T09:38:23Z

Do we need all of the extensions? LuxorExt.jl, GraphVizExt.jl , PlotsExt.jl are exactly the same as in Dagger and should be loaded on importing Dagger anyway

There are a few issues in the related functionality, so they could be fixed in a similar way, as write_dag(). However, these extensions do not serve any purpose now, so can be deleted.

SmalRat · 2024-06-24T09:44:21Z

Running graphs_scheduling/src/main.jl gives the following error once all the processors are done:

ERROR: LoadError: KeyError: key 68 not found
Stacktrace:
  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] _proc_color(ctx::@NamedTuple{proc_to_color::Dict{Dagger.Processor, String}, proc_colors::Vector{RGB{FixedPointNumbers.N0f8}}, proc_color_idx::Base.RefValue{Int64}, proc_to_shape::Dict{Type, String}, proc_shapes::Tuple{String, String, String}, proc_shape_idx::Base.RefValue{Int64}, id_to_proc::Dict{Int64, Dagger.Processor}}, id::Int64)
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:176
  [3] write_edge(io::IOStream, ts_move::TimespanLogging.Timespan, logs::Vector{TimespanLogging.Timespan}, ctx::@NamedTuple{proc_to_color::Dict{Dagger.Processor, String}, proc_colors::Vector{RGB{FixedPointNumbers.N0f8}}, proc_color_idx::Base.RefValue{Int64}, proc_to_shape::Dict{Type, String}, proc_shapes::Tuple{String, String, String}, proc_shape_idx::Base.RefValue{Int64}, id_to_proc::Dict{Int64, Dagger.Processor}}, inputname::String, inputarg::RemoteChannel{Channel{Int64}})
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:235
  [4] write_dag(io::IOStream, logs::Vector{TimespanLogging.Timespan}, t::Nothing)
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:323
  [5] write_dag
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:260 [inlined]
  [6] _show_plan
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:353 [inlined]
  [7] show_logs
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:369 [inlined]
  [8] show_logs(io::IOStream, logs::Vector{TimespanLogging.Timespan}, vizmode::Symbol; options::@Kwargs{})
    @ Main.ModGraphVizSimpleExt ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:363
  [9] show_logs
    @ ~/fwk2/dagger_exts/GraphVizSimpleExt.jl:363 [inlined]
 [10] #17
    @ ~/fwk2/graphs_scheduling/src/main.jl:92 [inlined]
 [11] open(::var"#17#18"{Vector{TimespanLogging.Timespan}}, ::String, ::Vararg{String}; kwargs::@Kwargs{})
    @ Base ./io.jl:396
 [12] open(::Function, ::String, ::String)
    @ Base ./io.jl:393
 [13] main(graphs_map::Dict{String, String})
    @ Main ~/fwk2/graphs_scheduling/src/main.jl:91
 [14] top-level scope
    @ ~/fwk2/graphs_scheduling/src/main.jl:106
in expression starting at /home/mafila/fwk2/graphs_scheduling/src/main.jl:106

Then it reports a series a warnings about workers dying and rescheduling - but it's a known problem

That is a bug I mentioned recently. The error is thrown when the logs are plotted before all the nodes finish computations, and it happens most of the time. I am currently working on fixing this problem.

m-fila

That is a bug I mentioned recently. The error is thrown when the logs are plotted before all the nodes finish computations, and it happens most of the time. I am currently working on fixing this problem.

I think I found the culprit

The problem appears because the logs are fetched before all the tasks are finished. For regular tasks their results are fetched when plotting individual graphs, but we never wait for the completion of the extra tasks that sends notifications.
We could get the tasks for that extra node which also happens to be the last node in a graph and wait for their completion

graphs_scheduling/src/main.jl

m-fila · 2024-06-24T16:24:50Z

graphs_scheduling/src/main.jl

+function execution(graphs_map)
+    graphs_being_run = Set{Int}()
+    graphs_dict = Dict{Int, String}()
+
+    graphs = parse_graphs(graphs_map, OUTPUT_GRAPH_PATH, OUTPUT_GRAPH_IMAGE_PATH)
+
+    notifications = RemoteChannel(()->Channel{Int}(32))
+    # notifications = Channel{Int}(32)
+
+    for (i, (g_name, g)) in enumerate(graphs)
+        graphs_dict[i] = g_name
+        while !(length(graphs_being_run) < MAX_GRAPHS_RUN)
+            finished_graph_id = take!(notifications)
+            delete!(graphs_being_run, finished_graph_id)
+            println("Dispatcher: graph finished - $finished_graph_id: $(graphs_dict[finished_graph_id])")
+        end
+
+        schedule_graph_with_notify(g, notifications, g_name, i)
+        push!(graphs_being_run, i)
+        println("Dispatcher: scheduled graph $i: $g_name")
+    end
+
+    results = []
+    for (g_name, g) in graphs
+        g_map = Dict{Int, Any}()
+        for vertex_id in Graphs.vertices(g)
+            future = get_prop(g, vertex_id, :res_data)
+            g_map[vertex_id] = fetch(future)
+        end
+        push!(results, (g_name, g_map))
+    end
+
+    for (g_name, res) in results
+        for (id, value) in res
+            println("Graph: $g_name, Final result for vertex $id: $value")
+        end
+    end
+end


Store whole graph tasks and wait for their completions. The tasks that are finished before the final graph is scheduled can be safely removed, so we wait only for completion of a few last tasks

Suggested change

function execution(graphs_map)

graphs_being_run = Set{Int}()

graphs_dict = Dict{Int, String}()

graphs = parse_graphs(graphs_map, OUTPUT_GRAPH_PATH, OUTPUT_GRAPH_IMAGE_PATH)

notifications = RemoteChannel(()->Channel{Int}(32))

# notifications = Channel{Int}(32)

for (i, (g_name, g)) in enumerate(graphs)

graphs_dict[i] = g_name

while !(length(graphs_being_run) < MAX_GRAPHS_RUN)

finished_graph_id = take!(notifications)

delete!(graphs_being_run, finished_graph_id)

println("Dispatcher: graph finished - $finished_graph_id: $(graphs_dict[finished_graph_id])")

end

schedule_graph_with_notify(g, notifications, g_name, i)

push!(graphs_being_run, i)

println("Dispatcher: scheduled graph $i: $g_name")

end

results = []

for (g_name, g) in graphs

g_map = Dict{Int, Any}()

for vertex_id in Graphs.vertices(g)

future = get_prop(g, vertex_id, :res_data)

g_map[vertex_id] = fetch(future)

end

push!(results, (g_name, g_map))

end

for (g_name, res) in results

for (id, value) in res

println("Graph: $g_name, Final result for vertex $id: $value")

end

end

end

function execution(graphs_map)

graphs_being_run = Set{Int}()

graphs_dict = Dict{Int, String}()

graphs_tasks = Dict{Int,Dagger.DTask}()

graphs = parse_graphs(graphs_map, OUTPUT_GRAPH_PATH, OUTPUT_GRAPH_IMAGE_PATH)

notifications = RemoteChannel(()->Channel{Int}(32))

# notifications = Channel{Int}(32)

for (i, (g_name, g)) in enumerate(graphs)

graphs_dict[i] = g_name

while !(length(graphs_being_run) < MAX_GRAPHS_RUN)

finished_graph_id = take!(notifications)

delete!(graphs_being_run, finished_graph_id)

delete!(graphs_tasks, i)

println("Dispatcher: graph finished - $finished_graph_id: $(graphs_dict[finished_graph_id])")

end

graphs_tasks[i] = schedule_graph_with_notify(g, notifications, g_name, i)

push!(graphs_being_run, i)

println("Dispatcher: scheduled graph $i: $g_name")

end

results = []

for (g_name, g) in graphs

g_map = Dict{Int, Any}()

for vertex_id in Graphs.vertices(g)

future = get_prop(g, vertex_id, :res_data)

g_map[vertex_id] = fetch(future)

end

push!(results, (g_name, g_map))

end

for (g_name, res) in results

for (id, value) in res

println("Graph: $g_name, Final result for vertex $id: $value")

end

end

for (_, task) in graphs_tasks

wait(task)

end

end

…el/Dagger.jl#536 Co-authored-by: Mateusz Jakub Fila <[email protected]>

SmalRat · 2024-06-25T02:03:58Z

That is a bug I mentioned recently. The error is thrown when the logs are plotted before all the nodes finish computations, and it happens most of the time. I am currently working on fixing this problem.

I think I found the culprit

The problem appears because the logs are fetched before all the tasks are finished. For regular tasks their results are fetched when plotting individual graphs, but we never wait for the completion of the extra tasks that sends notifications. We could get the tasks for that extra node which also happens to be the last node in a graph and wait for their completion

Oh, sorry, I did know the exact problem source, just did not describe it much, as was already working on the fix. It started to look too cluttered and hard to read for me, so I added some wrappers etc, consequently, it took a bit more time than expected) Besides, it should be easier to track which tasks are being run and which were already done by now. Moreover, now graphs are parsed one at a time.
However, if it is simpler to stay with the old style of code, I can commit your changes

m-fila

I think I'd rather take the version before adding AbstractMetaTask end the others.
Refactoring of scheduling graph pipeline should go to a different PR where we could focus solely on it

graphs_scheduling/src/main.jl

SmalRat · 2024-06-25T08:41:00Z

I think I'd rather take the version before adding AbstractMetaTask end the others. Refactoring of scheduling graph pipeline should go to a different PR where we could focus solely on it

OK, I will try to revert these changes

This reverts commit c038887.

m-fila · 2024-06-25T08:57:47Z

OK, I will try to revert these changes

I have a different proposal. Let's freeze this as is and I'll cherry pick the relevant commits to a new PR with a clean history and no conflicts

…ia-fwk" This reverts commit 77d8571, reversing changes made to 8546ce0.

…iaParallel/Dagger.jl#536" This reverts commit 8546ce0.

…o OOP-style" This reverts commit 98180b1.

SmalRat · 2024-06-25T09:08:39Z

OK, I will try to revert these changes

I have a different proposal. Let's freeze this as is and I'll cherry pick the relevant commits to a new PR with a clean history and no conflicts

Oh, sorry, I've seen this too late
I have created a separate branch for scheduling pipeline refactoring and reverted these changes here to the state of 8a52e6d

SmalRat · 2024-06-25T09:12:31Z

OK, I will try to revert these changes

I have a different proposal. Let's freeze this as is and I'll cherry pick the relevant commits to a new PR with a clean history and no conflicts

Oh, sorry, I've seen this too late I have created a separate branch for scheduling pipeline refactoring and reverted these changes here to the state of 8a52e6d

Now I can apply your suggestions on silencing the workers and a bug fix. But will wait for the response this time :)

m-fila · 2024-06-25T09:15:56Z

No problem. Please take a look at #11 and comment if anything is missing or I messed up something 😅

SmalRat · 2024-06-25T09:22:53Z

No problem. Please take a look at #11 and comment if anything is missing or I messed up something 😅

Looks like everything is fine there

SmalRat · 2024-06-25T09:24:36Z

Let's freeze this as is and I'll cherry pick the relevant commits to a new PR with a clean history and no conflicts

So, should we close this PR?

SmalRat and others added 8 commits May 24, 2024 02:31

Added scheduling in a dynamic manner; graph images generation, logs etc.

d90b1bf

Added readme with launch instruction

7b7eb52

Added examples; restructured project; the number of graphs running in…

305af7a

… parallel is upper bounded now

Added examples, docs + some fixes

267c51e

Added examples results, updated readme

809db78

Added examples for the render_logs() function

524ad3a

Fixes to the main program; added example

0c38339

Small fixes

8a52e6d

m-fila mentioned this pull request Jun 22, 2024

Reorganize project structure #9

Closed

m-fila reviewed Jun 24, 2024

View reviewed changes

SmalRat and others added 2 commits June 25, 2024 04:39

Fixed waiting on the DAG execution finish, made code closer to OOP-style

98180b1

Silence warnings about the workers dying and rescheduling JuliaParall…

8546ce0

…el/Dagger.jl#536 Co-authored-by: Mateusz Jakub Fila <[email protected]>

SmalRat added 2 commits June 25, 2024 05:09

Removed redundant (as of now) parts of code

4817121

Merge branch 'main' of https://github.com/SmalRat/key4hep-julia-fwk

77d8571

m-fila reviewed Jun 25, 2024

View reviewed changes

graphs_scheduling/src/main.jl Outdated Show resolved Hide resolved

Small fix

c038887

Revert "Small fix"

4d53829

This reverts commit c038887.

SmalRat added 3 commits June 25, 2024 12:03

Revert "Merge branch 'main' of https://github.com/SmalRat/key4hep-jul…

7a2c4b6

…ia-fwk" This reverts commit 77d8571, reversing changes made to 8546ce0.

Revert "Silence warnings about the workers dying and rescheduling Jul…

d5b1d19

…iaParallel/Dagger.jl#536" This reverts commit 8546ce0.

Revert "Fixed waiting on the DAG execution finish, made code closer t…

db4c89a

…o OOP-style" This reverts commit 98180b1.

m-fila mentioned this pull request Jun 25, 2024

Add parallel scheduling multiple graphs and update to dagger 0.18.11 #11

Merged

m-fila closed this Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to version Dagger 0.18.11 #7

Switch to version Dagger 0.18.11 #7

SmalRat commented Jun 20, 2024

m-fila commented Jun 20, 2024

SmalRat commented Jun 21, 2024

m-fila commented Jun 21, 2024

m-fila commented Jun 21, 2024 •

edited

Loading

m-fila commented Jun 21, 2024 •

edited

Loading

SmalRat commented Jun 24, 2024

SmalRat commented Jun 24, 2024

m-fila left a comment •

edited

Loading

m-fila Jun 24, 2024

SmalRat commented Jun 25, 2024

m-fila left a comment

SmalRat commented Jun 25, 2024

m-fila commented Jun 25, 2024

SmalRat commented Jun 25, 2024

SmalRat commented Jun 25, 2024

m-fila commented Jun 25, 2024 •

edited

Loading

SmalRat commented Jun 25, 2024

SmalRat commented Jun 25, 2024

Switch to version Dagger 0.18.11 #7

Switch to version Dagger 0.18.11 #7

Conversation

SmalRat commented Jun 20, 2024

m-fila commented Jun 20, 2024

SmalRat commented Jun 21, 2024

m-fila commented Jun 21, 2024

m-fila commented Jun 21, 2024 • edited Loading

m-fila commented Jun 21, 2024 • edited Loading

SmalRat commented Jun 24, 2024

SmalRat commented Jun 24, 2024

m-fila left a comment • edited Loading

Choose a reason for hiding this comment

m-fila Jun 24, 2024

Choose a reason for hiding this comment

SmalRat commented Jun 25, 2024

m-fila left a comment

Choose a reason for hiding this comment

SmalRat commented Jun 25, 2024

m-fila commented Jun 25, 2024

SmalRat commented Jun 25, 2024

SmalRat commented Jun 25, 2024

m-fila commented Jun 25, 2024 • edited Loading

SmalRat commented Jun 25, 2024

SmalRat commented Jun 25, 2024

m-fila commented Jun 21, 2024 •

edited

Loading

m-fila commented Jun 21, 2024 •

edited

Loading

m-fila left a comment •

edited

Loading

m-fila commented Jun 25, 2024 •

edited

Loading