Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sedna joint inference and federated learning controller optimization #430

Open
tangming1996 opened this issue May 7, 2024 · 3 comments
Open
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tangming1996
Copy link
Contributor

tangming1996 commented May 7, 2024

What would you like to be added/modified:
Sedna is an edge-cloud synergy AI project incubated in KubeEdge SIG AI. Benefiting from the edge-cloud synergy capabilities provided by KubeEdge, Sedna can implement across edge-cloud collaborative training and collaborative inference capabilities, such as joint inference, incremental learning, federated learning, and lifelong learning. Sedna supports popular AI frameworks, such as TensorFlow, Pytorch, PaddlePaddle, MindSpore.

Sedna can simply enable edge-cloud synergy capabilities to existing training and inference scripts, bringing the benefits of reducing costs, improving model performance, and protecting data privacy. However, there are still some functional defects in the joint inference and federated learning controller in the current Sedna project, which need to be solved, mainly in the following aspects:
Joint inference: 1. after the creation of joint inference task or federated learning task, the generated cloud and edge task instances will not be automatically rebuilt after failure or manual deletion, that is, lack of self-healing ability; 2. After the joint inference task CR is deleted, the task instance and service configuration generated by CR will not be cascaded. This defect will cause the subsequent failure to create the joint inference task.
What needs to be done: Current Sedna's joint inference and federated learning controllers are optimized to address the above functional deficiencies.

Why is this needed:
Current bugs in joint inference and federated learning can seriously affect the normal operation of both.

Recommended Skills:
Golang / Python

Useful links:
https://github.com/kubernetes/kubernetes
https://kubernetes.io/
https://github.com/kubernetes/client-go
https://github.com/kubeedge/kubeedge
https://github.com/kubeedge/sedna
https://www.topgoer.com/

@tangming1996 tangming1996 added the kind/feature Categorizes issue or PR as related to a new feature. label May 7, 2024
@MooreZheng
Copy link
Contributor

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

@SherlockShemol
Copy link
Contributor

/assign

@MooreZheng
Copy link
Contributor

Is this issue fixed with PR#437 and PR#446?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants