Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kepler Action should support runners other than Ubuntu system. #50

Open
jiere opened this issue Jun 25, 2023 · 20 comments
Open

Kepler Action should support runners other than Ubuntu system. #50

jiere opened this issue Jun 25, 2023 · 20 comments

Comments

@jiere
Copy link

jiere commented Jun 25, 2023

Currently our kepler-action scripts hard code for Ubuntu system runners, if users configured self-hosted runners which are not running Ubuntu, many actions will fail at very beginning.
What's more, it seems that current kepler-ci-artifacts release tarball, which is mainly for bcc, only publish deb files there

To support more potential self-hosted runners which are not running Ubuntu, we have to fill the gaps above asap.
The first step is to support rhel/centos platforms, IMO.

@SamYuan1990 , @rootfs .

@SamYuan1990
Copy link
Contributor

do we have any other OS supported by github action?

@SamYuan1990
Copy link
Contributor

SamYuan1990 commented Jun 25, 2023

for self hosted agent, is there any volunteer contribute an agent for us?

@SamYuan1990
Copy link
Contributor

logically, I want to install bcc by official package, but which is not available, you can find the reason in bcc repo.
hence for rpm, well we can try with official release if any....
But the question is why should we start this issue without an agent?
and... as I don't suppose we have different on either deb or rpm... as either ebpf or cgroup is from linux kernel.
hence, what's the benefits for us to support rpm OS as running this action repo on self-hosted runner/agent?

if we just want to support rpm, I am not sure but @rootfs , will redhat pipeline or cncf pipeline be a better option than github action?

@jiere , please help clarify with further more information.

@jiere
Copy link
Author

jiere commented Jun 25, 2023

For platform-validation feature, we plan to provide manual triggered workflow which runs on self-hosted runners, so that specific cases could be run on specific platforms(act as self-hosted runner).
In this way, we cannot limit users to install Ubuntu on their platforms, right?

@jiere
Copy link
Author

jiere commented Jun 25, 2023

In another word, platform validation feature should be one-shot test, or on-demand test, so manual triggered workflow is suitable for it. The runner no need to be volunteer for community use, just the platform vendor self-hosted is enough.
Through kepler's workflow-dispatch workflow, specific cases could be executed, that's our goal.

@SamYuan1990
Copy link
Contributor

For platform-validation feature, we plan to provide manual triggered workflow which runs on self-hosted runners, so that specific cases could be run on specific platforms(act as self-hosted runner). In this way, we cannot limit users to install Ubuntu on their platforms, right?

maybe no limited with github action.
even if we resolved OS issue, there still cpu arch issue....
if GHA as limitation, maybe we should open our mind with solutions.
take another CNCF project as COCO as example, https://github.com/confidential-containers, they have different cloud providers.

which means, for a hardware provider, if we just make a check once per quarter... as our release cycle.
are we sure we have to have a github action integration?

It's a valuable topic and discussion, but we should open our mind.
I would like to see what will happen and keep moving with 1st real case.(maybe anyone contribute an OS in RHEL with GHA there?)

Otherwise, I suppose it's too early to discuss in details today.

@jiere
Copy link
Author

jiere commented Jun 25, 2023

Let me clarify one thing here, why we choose to provide self-hosted runner? Not for the OS distro, but for the BareMetal host.
Platform validation cases should run directly on BareMetal host, not run in VM. So we cannot rely on VM based runners to run such cases.

@jiere
Copy link
Author

jiere commented Jun 25, 2023

take another CNCF project as COCO as example, https://github.com/confidential-containers, they have different cloud providers.

This is another topic actually, not in current issue scope :-D
That refers to if we should find some "permanent volunteer machines" as the runners.
In this issue, actually we want to address such scenario:

  1. No volunteer machines yet;
  2. Platform vendors want to test their platforms in Kepler formally, officially and automatically;
  3. Platform vendors do have BareMetal machines for test, but the machines are provisioned by RHEL, for example.

@SamYuan1990
Copy link
Contributor

take another CNCF project as COCO as example, https://github.com/confidential-containers, they have different cloud providers.

This is another topic actually, not in current issue scope :-D That refers to if we should find some "permanent volunteer machines" as the runners. In this issue, actually we want to address such scenario:

  1. No volunteer machines yet;
  2. Platform vendors want to test their platforms in Kepler formally and officially;
  3. Platform vendors do have BareMetal machines for test, but the machines are provisioned by RHEL, for example.

In my point of view.

  • the volunteer machine is necessary.
    as different BM may have different power api...
    hence we can't assuming anything before we have a machine.

in most case, the integration with a new machine, may need follow the guide which WIP as sustainable-computing-io/kepler-doc#60

I do agree the
Platform vendors want to test their platforms in Kepler formally, officially and automatically;
Platform vendors do have BareMetal machines for test, but the machines are provisioned by RHEL, for example.

Currently we have x86 supported, and from hardware provider point of view, I suppose the Platform vendor provides other CPU platform.
for cloud provider, maybe they just provide a k8s for each time running.

@SamYuan1990
Copy link
Contributor

btw, another open questions is that, should we create the k8s cluster for test or we leave Platform vendor to provide a k8s cluster for test for us?

@kenplusplus
Copy link

I think:

  1. A github action should not bind to specific OS, specially use "apt" in a nodejs script.... I am curious why using nodejs to do deploy work? Isn't ansible better to handle OS/platform/middleware?
  2. CoCo is also provided by big group, but it is out of this topic, since VM is not support well, correct?

@SamYuan1990
Copy link
Contributor

I think:

  1. A github action should not bind to specific OS, specially use "apt" in a nodejs script.... I am curious why using nodejs to do deploy work? Isn't ansible better to handle OS/platform/middleware?
  2. CoCo is also provided by big group, but it is out of this topic, since VM is not support well, correct?

I suppose we can start from sustainable-computing-io/kepler#482
@jiere starts from Platform validation, but ... I hope to link with a specific ticket or contributor.
@kenplusplus , no matter ansible or nodejs, some how/some where a apt/yum install should be executed to provide kepler dependency before we deploy kepler for test. Just because currently we just have github action running as ubuntu OS, we implements in nodejs.

@SamYuan1990
Copy link
Contributor

SamYuan1990 commented Jun 25, 2023

@jiere , @kenplusplus , @rootfs
here is my point,
if our start point is other platform as hardware support for BM, either ARM(sustainable-computing-io/kepler#482) or redfish?
I would say it's too early to discuss before we have a real BM from contributor.

if our point is discussion with today's code logic.
hence our 1st infrastructure is github action with ubuntu. and logically to install the dependency before deploy kepler for test. we implemented in nodejs with apt install.

if our point is a free discussion as brainstorming, well, personally I am open for any kind of integration, either apt or yum. ansible or GHA, or travis or Tekton etc.... at any level, no matter a OS or a k8s cluster. From the logical for testing:

  1. we have to install kepler dependency(as header?) before we deploy kepler for test.
  2. we have to install a k8s cluster before we deploy kepler for test.
  3. we have to make sure kepler mount with enough path on the host machine for either ebpf or cgroup.
  4. we have to make sure kepler get from external hardware resources if necessary.

if our discussion scope is just as switch to yum from apt, or ansible ... which is to limited.
In my point of view, a provider can contribute a BM to us, and we just deploy kepler and run test scripts. Hence we can reduce some security concerns from the provider, as aovid running either apt/yum script to edit the OS packages.
also, the provider is able to ask us to adjust from GHA to other CI platform. I am open mind on this.

@rootfs
Copy link
Contributor

rootfs commented Jun 29, 2023

I like the idea of supporting CI platforms other than GH action. That'll cover many cases we want to ensure the PRs or releases are fully tested. The limitation of runners is an issue unfortunately. Self hosting is an option, shall we start from this step first?

@SamYuan1990
Copy link
Contributor

I like the idea of supporting CI platforms other than GH action. That'll cover many cases we want to ensure the PRs or releases are fully tested. The limitation of runners is an issue unfortunately. Self hosting is an option, shall we start from this step first?

do we have any self hosting (github action agent) available for now to support us step forward?

@rootfs
Copy link
Contributor

rootfs commented Jun 29, 2023

It looks CNCF can provide Prow for hosted projects
https://github.com/cncf/servicedesk#continuous-integration

@SamYuan1990
Copy link
Contributor

It looks CNCF can provide Prow for hosted projects https://github.com/cncf/servicedesk#continuous-integration

do they provide any guidance or examples?
https://docs.prow.k8s.io/docs/components/core/hook/
it seems they still need working on their document...
pic

but as the hook setting is empty... I suppose we need a clear guidance to have a try.
and I am confusing with should we add them
https://github.com/kubernetes/test-infra/blob/master/config/jobs/cadvisor/cadvisor.yaml
or
https://github.com/google/cadvisor/blob/master/.github/workflows/test.yml

@SamYuan1990
Copy link
Contributor

I am extend my search scope from prow repos ... to search engine .... to fetch more informations.

@SamYuan1990
Copy link
Contributor

as it seems there no fedora support for github action, @rootfs , do we have any idea to test kepler-action on rhel?

@SamYuan1990
Copy link
Contributor

remove apt-get from this repo and make libbpf dependency impl by local dev cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants