Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Storing custom data beside tasks in the central task database #914

Open
ottointhesky opened this issue Nov 25, 2024 · 4 comments

Comments

@ottointhesky
Copy link
Contributor

We use ipp for distributed processing of large topographic data sets. Therefore, we typically split the data set into spatial tiles. Each tile is then distributed as separate task through ipp. To give the user detailed feedback on ipp cluster and the processing status, we are planning to create a kind of dashboard application which graphically shows tiles that have been (successfully or not successfully) processed, tiles that are currently computed and tiles that are waiting in the queue.
So we need some sort of mapping between task message id (msg_id) and our tile identifier. Of course it would be possible to store this mapping outside the ipp task database, but it would be a hassle to keep everything aligned/in sync. Things would be much easier, if the task database/interface would allow storing a custom data field (e.g. a comment string or something similar) which could store our tile id. This way no external mapping is needed.

@minrk
Copy link
Member

minrk commented Nov 26, 2024

I think storing arbitrary custom data is probably not something we should support, but a simple string task label seems sensible enough, and seems like it would work for what you are describing. Does that sound right?

@ottointhesky
Copy link
Contributor Author

You are perfectly right. A task label or task comment would do the trick.

@minrk
Copy link
Member

minrk commented Nov 26, 2024

Yes, I think that's doable. We would need to come up with the APIs for setting these and retrieving tasks based on them.

@ottointhesky
Copy link
Contributor Author

Well I have a narrow view on things, but I would suggest something like

ar = view.apply_async(task, label='my task label')

in case apply_async is sent to multiple engines (dview[:]) I would use the same label for all tasks

I'm not sure if AsyncResult should provide (read only) access to the label, but I guess it would be nice.
Since if haven't looked into the task database code yet, I cannot comment on how this should be handled inside.

Just for completeness: I'm willing to help with the implementation if needed/wanted....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants