Description
Is it possible to deactivate the cache in specific tasks within a workflow?
Context
I'm running a workflow with multiple steps and a long sequence of inputs that take different values. The splitter function is great here; without pydra, this can turn into very ugly and complicated nested loops. However, for some tasks it is preferable to re-do the computation in different workflow runs rather than taking up storage with the output. I haven't found anything in the code that allows the user to turn off the cache storage for tasks within a workflow. Is it not supported, or did I just miss it?
I did think of merging the tasks into larger tasks that only output what I need to store. But that also means that it will run multiple times within each run of the workflow, for all the input combinations, and that implies unnecessary computation.
Example
- Function A creates a list of words.
- Function B filters the list of words according to different criteria (you get two or three different subsets)
- Function C creates a large, heavy, computationally expensive table with the words from function A.
- Function D subsets the table created by C, using a filtered list from function B and other parameters.
- Function E performs computations on D based on different instructions.
The relationship between speed of computation and size of the output in Functions B and D makes it more convenient to rerun them when needed than to use storage for their cache. But if they are fusioned into one function, function B will be rerun more times than necessary.
I would appreciate any advice or guidance :)
Thank you for your amazing work in this package!
Metadata
Metadata
Assignees
Type
Projects
Status