Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sdk tests with papermill #2448

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yehudit1987
Copy link

What this PR does / why we need it:
This PR creates E2E tests for katib examples to run with papermill.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2417

Checklist:

  • Docs included if any changes are user facing

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Electronic-Waste
Copy link
Member

/rerun-all

@Electronic-Waste
Copy link
Member

Electronic-Waste commented Oct 28, 2024

@yehudit1987 Can you please fix these CI errors?

@Electronic-Waste
Copy link
Member

@yehudit1987 Can you sign your commits with git commit -s? The DCO checks failed due to this reason.

@Electronic-Waste
Copy link
Member

FYI, you can check this reference: https://github.com/kubeflow/katib/pull/2448/checks?check_run_id=32215445282

@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 963d367 to 6633aa5 Compare October 29, 2024 19:29
@yehudit1987 yehudit1987 marked this pull request as ready for review October 29, 2024 19:31
@Electronic-Waste
Copy link
Member

/rerun-all

2 similar comments
@YosiElias
Copy link
Member

/rerun-all

@Electronic-Waste
Copy link
Member

/rerun-all

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great contributions @yehudit1987! 🎉

I left some reviews for you, excluding notebooks. Will soon review other files :)

Btw, @andreyvelich @tenzen-y are busy with other projects now and will be back in the middle of November. Your PR will be merged then.

@@ -0,0 +1,28 @@
name: Run e2e sdk tests with papermill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Run e2e sdk tests with papermill
name: E2E Tests with Notebooks

I guess it will be better to make the testcase's name consistent with others :)

cancel-in-progress: true

jobs:
create-katib-notebooks-test:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create-katib-notebooks-test:
e2e:

Comment on lines 93 to 103
# Loop through each algorithm in the array
for algorithm_name in "${ALGORITHM_ARRAY[@]}"; do
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"
suggestion_name="$(basename "$suggestion_image_name")"
suggestions+=("$suggestion_name")
done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop is redundant with the loop in front of it:

# Search for Suggestion Images required for Trial.
for exp_name in "${EXPERIMENT_ARRAY[@]}"; do
exp_path=$(find examples/v1beta1 -name "${exp_name}.yaml")
algorithm_name="$(yq eval '.spec.algorithm.algorithmName' "$exp_path")"
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"
suggestion_name="$(basename "$suggestion_image_name")"
suggestions+=("$suggestion_name")
done

Can we combine these two loops into a unified one by using the ALGORITHM parameters with other e2e tests.

WDYT👀 @yehudit1987 @kubeflow/wg-automl-leads

echo "Papermill failed for notebook: $NOTEBOOK"
exit 1
}
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
done
done

A missing new line here

@@ -172,4 +182,4 @@ fi
echo -e "\nCleanup Build Cache...\n"
docker buildx prune -f

echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"

kubectl create namespace kubeflow-user-example-com
fi

exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exit 0
exit 0


echo "Start to setup Minikube Kubernetes Cluster"
kubectl version
kubectl cluster-info
kubectl get nodes

echo "Build and Load container images"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"

Comment on lines 33 to 35
- name: Setup Minikube Cluster
shell: bash
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true true "" "" "cmaes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse template-setup-e2e-test? I guess it will be better if we make full use of the existing template :)

Wait for you thoughts👀 @yehudit1987 @kubeflow/wg-automl-leads

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That template being use as a pre template to template-e2e-test. We are using the second one for running yaml experiments by calling a shell script that calls a python script. In our case we just need to add to the job a step that run the notebook directly with papermill. I guess we can use template-setup-e2e-test but it will not prevent us from using the new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM:)

@yehudit1987 yehudit1987 marked this pull request as draft November 3, 2024 10:58
@yehudit1987 yehudit1987 marked this pull request as ready for review November 3, 2024 11:16
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response @yehudit1987. I left a few comments for you.

And I'm busy with my works now. I'll give reviews on Notebooks later:)

Comment on lines 39 to 48
if [ -x "$(command -v apt-get)" ]; then
echo "Upgrading Podman using apt-get..."
sudo apt-get update
sudo apt-get install -y podman
elif [ -x "$(command -v dnf)" ]; then
echo "Upgrading Podman using dnf..."
sudo dnf upgrade podman -y
else
echo "Package manager not found. Skipping upgrade."
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please tell me why we need to use podman?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better to change the dir name from template-notebook-test to template-e2e-notebook-test to be consistent with other dirs:)

@Electronic-Waste
Copy link
Member

/rerun-all

@yehudit1987 yehudit1987 marked this pull request as draft November 5, 2024 09:06
@yehudit1987 yehudit1987 marked this pull request as ready for review November 5, 2024 13:00
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of effort @yehudit1987 ! Thanks for your contribution.

I left some comments for you. cc👀 @kubeflow/wg-automl-leads

@@ -671,4 +671,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod ||
(kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1)
kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod || (kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we could adjust the format of this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Recover its original state)

Comment on lines 101 to 138
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Experiment name and namespace.\n",
"namespace = \"kubeflow-user-example-com\"\n",
"namespace = \"kubeflow\"\n",
"experiment_name = \"cmaes-example\"\n",
"\n",
"metadata = V1ObjectMeta(\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add parameters tag in metadata and allow args in papermill rewrite them like: kubeflow/training-operator#2274?

@@ -314,7 +342,8 @@
"\n",
"# Start the Katib Experiment.\n",
"exp_name = \"tune-mnist\"\n",
"katib_client = katib.KatibClient()\n",
"namespace=\"kubeflow\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above

Comment on lines 444 to 475
"import time\n",
"time.sleep(120)\n",
"status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace fixed-time sleep with wait_for_experiment_condition()?

def wait_for_experiment_condition(
self,
name: str,
namespace: Optional[str] = None,
expected_condition: str = constants.EXPERIMENT_CONDITION_SUCCEEDED,
timeout: int = 600,
polling_interval: int = 15,
apiserver_timeout: int = constants.DEFAULT_TIMEOUT,
):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can use this API here.

@Electronic-Waste
Copy link
Member

/rerun-all

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort and updating broken Notebooks in Katib @yehudit1987!
Let's finalize this PR once we design the testing script in the Training Operator.

Comment on lines 47 to 54
- name: Run Jupyter Notebook with Papermill
shell: bash
run: |
IFS=',' read -r -a NOTEBOOK_ARRAY <<< "${{ inputs.notebook-input }}"
# Loop through each notebook path
for NOTEBOOK in "${NOTEBOOK_ARRAY[@]}"; do
OUTPUT_FILE="${NOTEBOOK%.ipynb}_output.ipynb"
echo "Running notebook: $NOTEBOOK"
papermill "$NOTEBOOK" "$OUTPUT_FILE" --log-output --kernel python3 || {
echo "Papermill failed for notebook: $NOTEBOOK"
exit 1
}
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed with @saileshd1402 in the Training Operator PR: kubeflow/training-operator#2274 (comment), we might want to create script to run those Notebooks with papermill rather than adding the script in the GitHub action directly.
I think, once we finalize it, we can use the same approach for Katib tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andreyvelich for now I fix all the previous comments.
One of the notebooks is again seems to be rewritten (even though using jupyter lab) as you suggest.
Anyway I will fix those notebooks together with the decision about using the script or not.
Please keep me update on that.

Comment on lines 444 to 475
"import time\n",
"time.sleep(120)\n",
"status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can use this API here.

Comment on lines +467 to +498
"pycharm": {
"name": "#%% md\n"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to edit these Notebooks using JupyterLab directly.
In that case, the JSON format will be correctly rendered for every IDE.
E.g. you can just run JupyterLab locally to edit them:

pip install jupyterlab
jupyter lab

@Electronic-Waste
Copy link
Member

/rerun-all

@andreyvelich
Copy link
Member

Hi @yehudit1987, do you have time to finalize this PR ?
@saileshd1402 implemented tests as part of this PR: kubeflow/training-operator#2274, so we can use the same script for Katib.

@yehudit1987
Copy link
Author

Hi @andreyvelich, yes I have been waited for your approval regarding the script decisions as mentioned above. I will finalize this PR.

@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 2c0ce60 to 59af784 Compare January 23, 2025 07:58
@yehudit1987 yehudit1987 reopened this Jan 23, 2025
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/XL and removed size/XS labels Jan 23, 2025
Yehudit Kerido added 2 commits January 23, 2025 10:18
Signed-off-by: Yehudit Kerido <[email protected]>
Signed-off-by: Yehudit Kerido <[email protected]>
@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 70d149a to 683608f Compare January 23, 2025 08:19
@yehudit1987 yehudit1987 marked this pull request as ready for review January 23, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Test] E2e Tests for Notebook Examples
4 participants