Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the large runner to install a full Kubeflow and run all integration tests at once in the same cluster on demand #3054

Open
7 tasks done
juliusvonkohout opened this issue Mar 16, 2025 · 12 comments · Fixed by #3070 · May be fixed by #3077
Open
7 tasks done
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Mar 16, 2025

Validation Checklist

  • I confirm that this is a Kubeflow-related issue.
  • I am reporting this in the appropriate repository.
  • I have followed the Kubeflow installation guidelines.
  • The issue report is detailed and includes version numbers where applicable.
  • This issue pertains to Kubeflow development.
  • I am available to work on this issue.
  • You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Version

master

Detailed Description

See kubeflow/community#829 (comment) for how to requests the 32 GB node.

And we need a new Workflow that just installs the full example/kustomization.yaml and then runs all tests on that cluster. This will help a lot with release testing and should give us the opportunity for a full End-to-end test.

@juliusvonkohout juliusvonkohout added help wanted Extra attention is needed good first issue Good for newcomers labels Mar 16, 2025
@juliusvonkohout juliusvonkohout added this to the 1.10.1 milestone Mar 16, 2025
@juliusvonkohout juliusvonkohout changed the title Use the large runner to install a full Kubeflow and run all integration tests at once in the smae cluster on demand for releases. Use the large runner to install a full Kubeflow and run all integration tests at once in the same cluster on demand Mar 16, 2025
@web1havv
Copy link

Hi @juliusvonkohout

Can you please assign this issue to me? I have already created a basic workflow and tested it locally that installs the kind, kustomize and then installs kubeflow and waits for all the components to become ready ? I am attaching the action.yaml below. I have a few doubts around the tests that we want to run after the installation step. Can you guide me to a few initial tests that would be a good start ?

name: Kubeflow E2E Tests

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  e2e-test:
    runs-on: ubuntu-latest
    # Resource class with more CPU/memory to handle Kubeflow requirements
    resource-class: large
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install kustomize
        run: |
          curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
          sudo mv kustomize /usr/local/bin/

      - name: Install kind
        run: |
          curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.26.0/kind-linux-amd64
          chmod +x ./kind
          sudo mv ./kind /usr/local/bin/kind

      - name: Configure system settings
        run: |
          sudo sysctl fs.inotify.max_user_instances=2280
          sudo sysctl fs.inotify.max_user_watches=1255360

      - name: Create kind cluster
        run: |
          cat <<EOF | kind create cluster --name=kubeflow --config=-
          kind: Cluster
          apiVersion: kind.x-k8s.io/v1alpha4
          nodes:
          - role: control-plane
            image: kindest/node:v1.32.0@sha256:c48c62eac5da28cdadcf560d1d8616cfa6783b58f0d94cf63ad1bf49600cb027
            kubeadmConfigPatches:
            - |
              kind: ClusterConfiguration
              apiServer:
                extraArgs:
                  "service-account-issuer": "https://kubernetes.default.svc"
                  "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"
          EOF

      - name: Save kubeconfig
        run: |
          kind get kubeconfig --name kubeflow > /tmp/kubeflow-config
          echo "KUBECONFIG=/tmp/kubeflow-config" >> $GITHUB_ENV

      - name: Create Docker registry secret
        run: |
          # Using GitHub token to authenticate with GitHub's container registry
          echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          kubectl create secret generic regcred \
            --from-file=.dockerconfigjson=$HOME/.docker/config.json \
            --type=kubernetes.io/dockerconfigjson

      - name: Install Kubeflow
        run: |
          while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do 
            echo "Retrying to apply resources"
            sleep 20
          done

      - name: Wait for deployments to be ready
        run: |
          echo "Waiting for deployments to be ready..."
          # Wait for all deployments to become ready with a 30-minute timeout
          kubectl wait --for=condition=available --timeout=30m -n kubeflow deployment --all
          # Wait for all pods to be ready
          kubectl wait --for=condition=ready --timeout=30m -n kubeflow pod --all

@web1havv
Copy link

I am assuming that I have to run all the tests in this location - https://github.com/kubeflow/manifests/tree/master/tests/gh-actions
If that is the case I can create a loop that runs through the tests in parallel and fails the action if one of them fails by watching on the threads. Just waiting for your go ahead here.

I am also trying to figure out if I can run the action without actually creating a PR to test this out end to end. I have tried with https://github.com/nektos/act but was curious if there is an established flow to test GH action changes

@vikas-saxena02
Copy link

@web1havv get a draft PR and I will assign this to you

@juliusvonkohout
Copy link
Member Author

Yes, please create a PR and also add the PSS and KFP tests from here https://github.com/kubeflow/manifests/blob/master/.github/workflows/pipeline_test.yaml#L101

@kunal-511
Copy link
Contributor

Is this up for grab @juliusvonkohout ?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Mar 24, 2025

Is this up for grab @juliusvonkohout ?

Well I have not yet seen a meaningful active PR linked to this issue so far, so yes everyone can work on it. Nevertheless you can also work together on a PR.

@Pranali3103
Copy link

@juliusvonkohout Can I take up this issue and work on it , can you assign me this ?

@vikas-saxena02
Copy link

@juliusvonkohout Can I take up this issue and work on it , can you assign me this ?

@Pranali3103 Please get a draft PR and I will assign this to you

@kunal-511
Copy link
Contributor

In this I have to write tests for the complete kubeflow working ? by installing it using the readme instructions and running all integration tests in one file?

@juliusvonkohout
Copy link
Member Author

In this I have to write tests for the complete kubeflow working ? by installing it using the readme instructions and running all integration tests in one file?

More or less yet, but there will be duplicated stuff that you only need once. #3054 (comment) is already a good start.

@juliusvonkohout
Copy link
Member Author

Reopen for the follow up PR. #3070 (comment)

@juliusvonkohout
Copy link
Member Author

/assign @kunal-511

@kunal-511 kunal-511 linked a pull request Mar 28, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
5 participants