You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataone uses a hash store. We should be able to mount/access this store directly from argo workflows, which will avoid the need to fetch & download those assets for processing.
The text was updated successfully, but these errors were encountered:
Yes, you should be able to access a test instance in dev, and you could mock it as well. I think it would be useful to discuss access patterns for datasets so that we can minimize network transfer -- my initial thought is that we should be arranging access such that the workflow is passed object locations via config that it can read from and write to, and other components would be responsible for staging objects. This will likely work well for most datasets.
That said, for exceedingly large datasets, we found that storage management (and using distributed, fast SSD pools with a later results merge step onto shared storage) was essential to preventing massive I/O waits during workflow runs. Happy to discuss our previous findings on bottlenecks.
Dataone uses a hash store. We should be able to mount/access this store directly from argo workflows, which will avoid the need to fetch & download those assets for processing.
The text was updated successfully, but these errors were encountered: