Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubevirt: upgrade k3s,multus,kubevirt,cdi,longhorn #4501

Merged
merged 2 commits into from
Feb 5, 2025

Conversation

andrewd-zededa
Copy link
Contributor

A kube container generation number is tracked in cluster-update.sh
and compared to an 'applied' version in the persistent node fs /var/lib.
When the applied version is behind it triggers component upgrades.
cluster-update.sh uses an inline command in the zedkube micro service to publish
status updates to zedagent which will trigger info messages (ZInfoKubeClusterUpdateStatus) to a controller.

Signed-off-by: Andrew Durbin [email protected]

@andrewd-zededa
Copy link
Contributor Author

added zedagent/nokube.go with go build pragmatic to not build with kubevirt, and added go build pragma to build zedagent/handleedgenodecluster.go only in kubevirt to handle HV=kvm build issues.

@andrewd-zededa andrewd-zededa force-pushed the kube-upgrades branch 2 times, most recently from 7cb5511 to aa62285 Compare January 28, 2025 21:53
@andrewd-zededa
Copy link
Contributor Author

The same yetus failure due to too large go-mod vendor commit:

./out has been created
Modes: GitHubActions Robot InContainer ResetRepo UnitTests
Processing: GH:4501
GITHUB PR #4501 is being downloaded from
https://api.github.com/repos/lf-edge/eve/pulls/4501
JSON data at Tue Jan 28 09:54:00 PM UTC 2025
Patch data at Tue Jan 28 09:54:01 PM UTC 2025
ERROR: Unsure how to process GH:4501. Permissions missing?

@andrewd-zededa andrewd-zededa force-pushed the kube-upgrades branch 2 times, most recently from 807133e to 78fb8e2 Compare January 28, 2025 23:10
@andrewd-zededa andrewd-zededa marked this pull request as ready for review January 29, 2025 00:12
@eriknordmark eriknordmark requested a review from deitch January 29, 2025 03:41
@andrewd-zededa
Copy link
Contributor Author

I just saw the conflict, I'll rebase

@andrewd-zededa andrewd-zededa force-pushed the kube-upgrades branch 2 times, most recently from 4b8090d to b192193 Compare January 31, 2025 00:18
@andrewd-zededa
Copy link
Contributor Author

Rebase again, also fixed a series of yetus issues in infotypes.go and a few elsewhere.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run tests

@andrewd-zededa
Copy link
Contributor Author

Removed misplaced Update_RunDescheduler() in latest push, should only be in #4506

@github-actions github-actions bot requested a review from eriknordmark January 31, 2025 16:28
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -788,6 +797,7 @@ func mainEventLoop(zedagentCtx *zedagentContext, stillRunning *time.Ticker) {
dnsCtx := zedagentCtx.dnsCtx

hwInfoTiker := time.NewTicker(3 * time.Hour)
kubeClusterUpdateTicker := time.NewTicker(1 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this triggered by a change i.e., as a result of the handleCreate/Modify/Delete being called by the collection you subscribe to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under handleClusterUpdateStatusImpl() the subscription will run the same trigger path. This ticker will help with connection outages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in fact it might break in interesting ways.

The info messages are queued and retried until they succeed. And if there is an update (resulting in updated information for that key) that will replace the queued info message. This ensures that even with outages the controller will eventually (once the outage is over) receive the most recent info for that object/key.

Sending periodic info messages just confuses the reader and is spending tons on money on bandwidth for no reason what so ever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can remove the ticker then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest push

@andrewd-zededa andrewd-zededa force-pushed the kube-upgrades branch 2 times, most recently from 31ff899 to 0ad79a5 Compare February 1, 2025 05:27
@andrewd-zededa
Copy link
Contributor Author

Pushed a change to address the watchdog crash. The crash was introduced with my refactoring to add zedagent/nokube.go.

@github-actions github-actions bot requested a review from eriknordmark February 1, 2025 05:45
Copy link

codecov bot commented Feb 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.90%. Comparing base (b871d8d) to head (0ad79a5).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4501   +/-   ##
=======================================
  Coverage   20.90%   20.90%           
=======================================
  Files          13       13           
  Lines        2894     2894           
=======================================
  Hits          605      605           
  Misses       2163     2163           
  Partials      126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-try tests

@@ -0,0 +1 @@
./upgrades -c -d -n kube3 -l kubevirt,cdi,longhorn -f upgrades.yaml -u
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend this document to be readable by a developer? Then adding a few words of context would make sense. If not, why is it called README?

For instance, is this an example or the exact command used to cause an update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh certainly, that's way too short

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest push includes a longer pkg/kube/update-component/README.md instead.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select statement in zedagent reading from the closed channel does cause a busy wait on EVE kvm i.e., very high CPU load since it selects and then ok is false. So you need a different approach, e.g., not closing the channel.

A kube container generation number is tracked in cluster-update.sh
and compared to an 'applied' version in the persistent node fs /var/lib.
When the applied version is behind it triggers component upgrades.
cluster-update.sh uses an inline command in the zedkube micro service to publish
status updates to zedagent which will trigger info messages (ZInfoKubeClusterUpdateStatus) to a controller.

Signed-off-by: Andrew Durbin <[email protected]>
for update-component

Signed-off-by: Andrew Durbin <[email protected]>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit 7f05ef3 into lf-edge:master Feb 5, 2025
63 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants