Sdk tests with papermill #2448

yehudit1987 · 2024-10-27T10:51:42Z

What this PR does / why we need it:
This PR creates E2E tests for katib examples to run with papermill.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2417

Checklist:

Docs included if any changes are user facing

review-notebook-app · 2024-10-27T10:51:47Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Electronic-Waste · 2024-10-28T02:57:15Z

/rerun-all

Electronic-Waste · 2024-10-28T06:22:56Z

@yehudit1987 Can you please fix these CI errors?

Electronic-Waste · 2024-10-29T12:07:12Z

@yehudit1987 Can you sign your commits with git commit -s? The DCO checks failed due to this reason.

Electronic-Waste · 2024-10-29T12:08:26Z

FYI, you can check this reference: https://github.com/kubeflow/katib/pull/2448/checks?check_run_id=32215445282

Electronic-Waste · 2024-10-31T05:13:54Z

/rerun-all

YosiElias · 2024-10-31T08:51:56Z

/rerun-all

Electronic-Waste · 2024-10-31T09:28:30Z

/rerun-all

Electronic-Waste

Thanks for your great contributions @yehudit1987! 🎉

I left some reviews for you, excluding notebooks. Will soon review other files :)

Btw, @andreyvelich @tenzen-y are busy with other projects now and will be back in the middle of November. Your PR will be merged then.

Electronic-Waste · 2024-10-31T15:33:38Z

.github/workflows/sdk-e2e-tests-with-papermill.yaml

@@ -0,0 +1,28 @@
+name: Run e2e sdk tests with papermill


Suggested change

name: Run e2e sdk tests with papermill

name: E2E Tests with Notebooks

I guess it will be better to make the testcase's name consistent with others :)

Electronic-Waste · 2024-10-31T15:39:23Z

.github/workflows/sdk-e2e-tests-with-papermill.yaml

+  cancel-in-progress: true
+
+jobs:
+  create-katib-notebooks-test:


Suggested change

create-katib-notebooks-test:

e2e:

Electronic-Waste · 2024-10-31T15:55:50Z

test/e2e/v1beta1/scripts/gh-actions/build-load.sh

+    # Loop through each algorithm in the array
+    for algorithm_name in "${ALGORITHM_ARRAY[@]}"; do
+      suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \
+        manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"
+      suggestion_name="$(basename "$suggestion_image_name")"
+      suggestions+=("$suggestion_name")
+    done
+


I think this loop is redundant with the loop in front of it:

katib/test/e2e/v1beta1/scripts/gh-actions/build-load.sh

Lines 77 to 89 in 706a6f2

# Search for Suggestion Images required for Trial.

for exp_name in "${EXPERIMENT_ARRAY[@]}"; do

exp_path=$(find examples/v1beta1 -name "${exp_name}.yaml")

algorithm_name="$(yq eval '.spec.algorithm.algorithmName' "$exp_path")"

suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \

manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"

suggestion_name="$(basename "$suggestion_image_name")"

suggestions+=("$suggestion_name")

done

Can we combine these two loops into a unified one by using the ALGORITHM parameters with other e2e tests.

WDYT👀 @yehudit1987 @kubeflow/wg-automl-leads

Electronic-Waste · 2024-10-31T16:20:57Z

.github/workflows/template-notebook-test/action.yaml

+              echo "Papermill failed for notebook: $NOTEBOOK"
+              exit 1
+          }
+        done


Suggested change

done

done

A missing new line here

Electronic-Waste · 2024-10-31T16:21:20Z

test/e2e/v1beta1/scripts/gh-actions/build-load.sh

@@ -172,4 +182,4 @@ fi
 echo -e "\nCleanup Build Cache...\n"
 docker buildx prune -f

-echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
+echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"


Suggested change

echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"

echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"

Electronic-Waste · 2024-10-31T16:21:44Z

test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh

+  kubectl create namespace kubeflow-user-example-com
+fi
+
+exit 0


Suggested change

exit 0

exit 0

Electronic-Waste · 2024-10-31T16:22:23Z

test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh


 echo "Start to setup Minikube Kubernetes Cluster"
 kubectl version
 kubectl cluster-info
 kubectl get nodes

 echo "Build and Load container images"
-./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" 
+./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"


Suggested change

./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"

./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"

Electronic-Waste · 2024-10-31T16:25:11Z

.github/workflows/template-notebook-test/action.yaml

+    - name: Setup Minikube Cluster
+      shell: bash
+      run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true true "" "" "cmaes"


Could we reuse template-setup-e2e-test? I guess it will be better if we make full use of the existing template :)

Wait for you thoughts👀 @yehudit1987 @kubeflow/wg-automl-leads

That template being use as a pre template to template-e2e-test. We are using the second one for running yaml experiments by calling a shell script that calls a python script. In our case we just need to add to the job a step that run the notebook directly with papermill. I guess we can use template-setup-e2e-test but it will not prevent us from using the new one.

Electronic-Waste

Sorry for the late response @yehudit1987. I left a few comments for you.

And I'm busy with my works now. I'll give reviews on Notebooks later:)

Electronic-Waste · 2024-11-05T08:15:44Z

test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh

+if [ -x "$(command -v apt-get)" ]; then
+  echo "Upgrading Podman using apt-get..."
+  sudo apt-get update
+  sudo apt-get install -y podman
+elif [ -x "$(command -v dnf)" ]; then
+  echo "Upgrading Podman using dnf..."
+  sudo dnf upgrade podman -y
+else
+  echo "Package manager not found. Skipping upgrade."
+fi


Could you please tell me why we need to use podman?

Electronic-Waste · 2024-11-05T08:16:55Z

.github/workflows/template-notebook-test/action.yaml

It will be better to change the dir name from template-notebook-test to template-e2e-notebook-test to be consistent with other dirs:)

Electronic-Waste · 2024-11-05T08:38:14Z

/rerun-all

Electronic-Waste

A lot of effort @yehudit1987 ! Thanks for your contribution.

I left some comments for you. cc👀 @kubeflow/wg-automl-leads

Electronic-Waste · 2024-11-19T09:55:41Z

examples/v1beta1/kubeflow-pipelines/kubeflow-e2e-mnist.ipynb

@@ -671,4 +671,4 @@
 },
 "nbformat": 4,
 "nbformat_minor": 4
-}
+}


Suggested change

}

}

Electronic-Waste · 2024-11-19T09:56:56Z

test/e2e/v1beta1/scripts/gh-actions/setup-katib.sh

-kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod ||
-  (kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1)
+kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod || (kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1)


I think it would be better if we could adjust the format of this line.

(Recover its original state)

Electronic-Waste · 2024-11-19T10:01:23Z

examples/v1beta1/sdk/cmaes-and-resume-policies.ipynb

+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
   "outputs": [],
   "source": [
    "# Experiment name and namespace.\n",
-    "namespace = \"kubeflow-user-example-com\"\n",
+    "namespace = \"kubeflow\"\n",
    "experiment_name = \"cmaes-example\"\n",
    "\n",
    "metadata = V1ObjectMeta(\n",


Can we add parameters tag in metadata and allow args in papermill rewrite them like: kubeflow/training-operator#2274?

Electronic-Waste · 2024-11-19T10:02:28Z

examples/v1beta1/sdk/tune-train-from-func.ipynb

@@ -314,7 +342,8 @@
    "\n",
    "# Start the Katib Experiment.\n",
    "exp_name = \"tune-mnist\"\n",
-    "katib_client = katib.KatibClient()\n",
+    "namespace=\"kubeflow\"\n",


Electronic-Waste · 2024-11-19T10:06:11Z

examples/v1beta1/sdk/tune-train-from-func.ipynb

+    "import time\n",
+    "time.sleep(120)\n",
+    "status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n",


Can we replace fixed-time sleep with wait_for_experiment_condition()?

katib/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

Lines 1002 to 1010 in 2b41ae6

def wait_for_experiment_condition(

self,

name: str,

namespace: Optional[str] = None,

expected_condition: str = constants.EXPERIMENT_CONDITION_SUCCEEDED,

timeout: int = 600,

polling_interval: int = 15,

apiserver_timeout: int = constants.DEFAULT_TIMEOUT,

):

Yes, I think we can use this API here.

Electronic-Waste · 2024-11-19T10:09:33Z

/rerun-all

andreyvelich

Thank you for this effort and updating broken Notebooks in Katib @yehudit1987!
Let's finalize this PR once we design the testing script in the Training Operator.

andreyvelich · 2024-11-30T00:57:11Z

.github/workflows/template-e2e-notebook-test/action.yaml

+    - name: Run Jupyter Notebook with Papermill
+      shell: bash
+      run: |
+        IFS=',' read -r -a NOTEBOOK_ARRAY <<< "${{ inputs.notebook-input }}"
+        # Loop through each notebook path
+        for NOTEBOOK in "${NOTEBOOK_ARRAY[@]}"; do
+          OUTPUT_FILE="${NOTEBOOK%.ipynb}_output.ipynb"
+          echo "Running notebook: $NOTEBOOK"
+          papermill "$NOTEBOOK" "$OUTPUT_FILE" --log-output --kernel python3 || {
+              echo "Papermill failed for notebook: $NOTEBOOK"
+              exit 1
+          }
+        done


As we discussed with @saileshd1402 in the Training Operator PR: kubeflow/training-operator#2274 (comment), we might want to create script to run those Notebooks with papermill rather than adding the script in the GitHub action directly.
I think, once we finalize it, we can use the same approach for Katib tests.

Hi @andreyvelich for now I fix all the previous comments.
One of the notebooks is again seems to be rewritten (even though using jupyter lab) as you suggest.
Anyway I will fix those notebooks together with the decision about using the script or not.
Please keep me update on that.

andreyvelich · 2024-11-30T00:59:22Z

examples/v1beta1/sdk/tune-train-from-func.ipynb

+    "import time\n",
+    "time.sleep(120)\n",
+    "status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n",


Yes, I think we can use this API here.

andreyvelich · 2024-11-30T01:01:50Z

examples/v1beta1/sdk/tune-train-from-func.ipynb

+    "pycharm": {
+     "name": "#%% md\n"
+    }


I would suggest to edit these Notebooks using JupyterLab directly.
In that case, the JSON format will be correctly rendered for every IDE.
E.g. you can just run JupyterLab locally to edit them:

pip install jupyterlab jupyter lab

Electronic-Waste · 2024-12-02T13:20:05Z

/rerun-all

andreyvelich · 2025-01-21T13:31:52Z

Hi @yehudit1987, do you have time to finalize this PR ?
@saileshd1402 implemented tests as part of this PR: kubeflow/training-operator#2274, so we can use the same script for Katib.

yehudit1987 · 2025-01-22T10:37:24Z

Hi @andreyvelich, yes I have been waited for your approval regarding the script decisions as mentioned above. I will finalize this PR.

google-oss-prow · 2025-01-23T08:09:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow · 2025-01-23T08:09:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Yehudit Kerido <[email protected]>

google-oss-prow bot added the size/XXL label Oct 27, 2024

google-oss-prow bot requested review from anencore94 and Electronic-Waste October 27, 2024 10:51

yehudit1987 marked this pull request as draft October 28, 2024 09:45

google-oss-prow bot added the do-not-merge/work-in-progress label Oct 28, 2024

yehudit1987 force-pushed the sdk-tests-with-papermill branch from 963d367 to 6633aa5 Compare October 29, 2024 19:29

yehudit1987 marked this pull request as ready for review October 29, 2024 19:31

google-oss-prow bot removed the do-not-merge/work-in-progress label Oct 29, 2024

Electronic-Waste reviewed Oct 31, 2024

View reviewed changes

yehudit1987 marked this pull request as draft November 3, 2024 10:58

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 3, 2024

yehudit1987 marked this pull request as ready for review November 3, 2024 11:16

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 3, 2024

google-oss-prow bot requested a review from Electronic-Waste November 3, 2024 11:16

Electronic-Waste reviewed Nov 5, 2024

View reviewed changes

yehudit1987 marked this pull request as draft November 5, 2024 09:06

google-oss-prow bot added do-not-merge/work-in-progress size/XL and removed size/XXL labels Nov 5, 2024

yehudit1987 marked this pull request as ready for review November 5, 2024 13:00

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 5, 2024

google-oss-prow bot requested a review from Electronic-Waste November 19, 2024 09:33

Electronic-Waste reviewed Nov 19, 2024

View reviewed changes

andreyvelich reviewed Nov 30, 2024

View reviewed changes

yehudit1987 marked this pull request as draft December 1, 2024 10:48

google-oss-prow bot added do-not-merge/work-in-progress size/XXL size/XL and removed size/XL size/XXL labels Dec 1, 2024

yehudit1987 closed this Jan 23, 2025

yehudit1987 force-pushed the sdk-tests-with-papermill branch from 2c0ce60 to 59af784 Compare January 23, 2025 07:58

google-oss-prow bot added size/XS and removed size/XXL labels Jan 23, 2025

yehudit1987 reopened this Jan 23, 2025

google-oss-prow bot added size/XL and removed size/XS labels Jan 23, 2025

Yehudit Kerido added 2 commits January 23, 2025 10:18

sdk-tests-with-papermill

f2d1e7a

Signed-off-by: Yehudit Kerido <[email protected]>

sdk tests with papermill

683608f

Signed-off-by: Yehudit Kerido <[email protected]>

yehudit1987 force-pushed the sdk-tests-with-papermill branch from 70d149a to 683608f Compare January 23, 2025 08:19

yehudit1987 marked this pull request as ready for review January 23, 2025 08:35

google-oss-prow bot removed the do-not-merge/work-in-progress label Jan 23, 2025

google-oss-prow bot requested a review from Electronic-Waste January 23, 2025 08:35

	name: Run e2e sdk tests with papermill
	name: E2E Tests with Notebooks

	# Search for Suggestion Images required for Trial.
	for exp_name in "${EXPERIMENT_ARRAY[@]}"; do

	exp_path=$(find examples/v1beta1 -name "${exp_name}.yaml")
	algorithm_name="$(yq eval '.spec.algorithm.algorithmName' "$exp_path")"

	suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] \| select(.algorithmName == env(algorithm_name)) \| .image' \
	manifests/v1beta1/installs/katib-standalone/katib-config.yaml \| cut -d: -f1)"
	suggestion_name="$(basename "$suggestion_image_name")"

	suggestions+=("$suggestion_name")

	done

	echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
	echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"

	./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"
	./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"

	def wait_for_experiment_condition(
	self,
	name: str,
	namespace: Optional[str] = None,
	expected_condition: str = constants.EXPERIMENT_CONDITION_SUCCEEDED,
	timeout: int = 600,
	polling_interval: int = 15,
	apiserver_timeout: int = constants.DEFAULT_TIMEOUT,
	):

Sdk tests with papermill #2448

Are you sure you want to change the base?

Sdk tests with papermill #2448

Conversation

yehudit1987 commented Oct 27, 2024

review-notebook-app bot commented Oct 27, 2024

Electronic-Waste commented Oct 28, 2024

Electronic-Waste commented Oct 28, 2024 • edited Loading

Electronic-Waste commented Oct 29, 2024

Electronic-Waste commented Oct 29, 2024

Electronic-Waste commented Oct 31, 2024

YosiElias commented Oct 31, 2024

Electronic-Waste commented Oct 31, 2024

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste commented Nov 5, 2024

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste commented Nov 19, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste commented Dec 2, 2024

andreyvelich commented Jan 21, 2025

yehudit1987 commented Jan 22, 2025

google-oss-prow bot commented Jan 23, 2025

google-oss-prow bot commented Jan 23, 2025

Electronic-Waste commented Oct 28, 2024 •

edited

Loading