-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sdk tests with papermill #2448
base: master
Are you sure you want to change the base?
Sdk tests with papermill #2448
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
/rerun-all |
@yehudit1987 Can you please fix these CI errors? |
@yehudit1987 Can you sign your commits with |
FYI, you can check this reference: https://github.com/kubeflow/katib/pull/2448/checks?check_run_id=32215445282 |
963d367
to
6633aa5
Compare
/rerun-all |
2 similar comments
/rerun-all |
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great contributions @yehudit1987! 🎉
I left some reviews for you, excluding notebooks. Will soon review other files :)
Btw, @andreyvelich @tenzen-y are busy with other projects now and will be back in the middle of November. Your PR will be merged then.
@@ -0,0 +1,28 @@ | |||
name: Run e2e sdk tests with papermill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: Run e2e sdk tests with papermill | |
name: E2E Tests with Notebooks |
I guess it will be better to make the testcase's name consistent with others :)
cancel-in-progress: true | ||
|
||
jobs: | ||
create-katib-notebooks-test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create-katib-notebooks-test: | |
e2e: |
# Loop through each algorithm in the array | ||
for algorithm_name in "${ALGORITHM_ARRAY[@]}"; do | ||
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \ | ||
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)" | ||
suggestion_name="$(basename "$suggestion_image_name")" | ||
suggestions+=("$suggestion_name") | ||
done | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loop is redundant with the loop in front of it:
katib/test/e2e/v1beta1/scripts/gh-actions/build-load.sh
Lines 77 to 89 in 706a6f2
# Search for Suggestion Images required for Trial. | |
for exp_name in "${EXPERIMENT_ARRAY[@]}"; do | |
exp_path=$(find examples/v1beta1 -name "${exp_name}.yaml") | |
algorithm_name="$(yq eval '.spec.algorithm.algorithmName' "$exp_path")" | |
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \ | |
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)" | |
suggestion_name="$(basename "$suggestion_image_name")" | |
suggestions+=("$suggestion_name") | |
done |
Can we combine these two loops into a unified one by using the ALGORITHM
parameters with other e2e tests.
WDYT👀 @yehudit1987 @kubeflow/wg-automl-leads
echo "Papermill failed for notebook: $NOTEBOOK" | ||
exit 1 | ||
} | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done | |
done | |
A missing new line here
@@ -172,4 +182,4 @@ fi | |||
echo -e "\nCleanup Build Cache...\n" | |||
docker buildx prune -f | |||
|
|||
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n" | |||
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n" | |
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n" | |
kubectl create namespace kubeflow-user-example-com | ||
fi | ||
|
||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit 0 | |
exit 0 | |
|
||
echo "Start to setup Minikube Kubernetes Cluster" | ||
kubectl version | ||
kubectl cluster-info | ||
kubectl get nodes | ||
|
||
echo "Build and Load container images" | ||
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" | ||
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS" | |
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS" | |
- name: Setup Minikube Cluster | ||
shell: bash | ||
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true true "" "" "cmaes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse template-setup-e2e-test
? I guess it will be better if we make full use of the existing template :)
Wait for you thoughts👀 @yehudit1987 @kubeflow/wg-automl-leads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That template being use as a pre template to template-e2e-test. We are using the second one for running yaml experiments by calling a shell script that calls a python script. In our case we just need to add to the job a step that run the notebook directly with papermill. I guess we can use template-setup-e2e-test but it will not prevent us from using the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response @yehudit1987. I left a few comments for you.
And I'm busy with my works now. I'll give reviews on Notebooks later:)
if [ -x "$(command -v apt-get)" ]; then | ||
echo "Upgrading Podman using apt-get..." | ||
sudo apt-get update | ||
sudo apt-get install -y podman | ||
elif [ -x "$(command -v dnf)" ]; then | ||
echo "Upgrading Podman using dnf..." | ||
sudo dnf upgrade podman -y | ||
else | ||
echo "Package manager not found. Skipping upgrade." | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please tell me why we need to use podman?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to change the dir name from template-notebook-test
to template-e2e-notebook-test
to be consistent with other dirs:)
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of effort @yehudit1987 ! Thanks for your contribution.
I left some comments for you. cc👀 @kubeflow/wg-automl-leads
@@ -671,4 +671,4 @@ | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 4 | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod || | ||
(kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1) | ||
kubectl wait --for=condition=ContainersReady=True --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in ($WITH_DATABASE_TYPE,controller,db-manager,ui)" -n kubeflow pod || (kubectl get pods -n kubeflow && kubectl describe pods -n kubeflow && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if we could adjust the format of this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Recover its original state)
"metadata": { | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Experiment name and namespace.\n", | ||
"namespace = \"kubeflow-user-example-com\"\n", | ||
"namespace = \"kubeflow\"\n", | ||
"experiment_name = \"cmaes-example\"\n", | ||
"\n", | ||
"metadata = V1ObjectMeta(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add parameters
tag in metadata
and allow args in papermill rewrite them like: kubeflow/training-operator#2274?
@@ -314,7 +342,8 @@ | |||
"\n", | |||
"# Start the Katib Experiment.\n", | |||
"exp_name = \"tune-mnist\"\n", | |||
"katib_client = katib.KatibClient()\n", | |||
"namespace=\"kubeflow\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above
"import time\n", | ||
"time.sleep(120)\n", | ||
"status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace fixed-time sleep with wait_for_experiment_condition()
?
katib/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Lines 1002 to 1010 in 2b41ae6
def wait_for_experiment_condition( | |
self, | |
name: str, | |
namespace: Optional[str] = None, | |
expected_condition: str = constants.EXPERIMENT_CONDITION_SUCCEEDED, | |
timeout: int = 600, | |
polling_interval: int = 15, | |
apiserver_timeout: int = constants.DEFAULT_TIMEOUT, | |
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can use this API here.
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this effort and updating broken Notebooks in Katib @yehudit1987!
Let's finalize this PR once we design the testing script in the Training Operator.
- name: Run Jupyter Notebook with Papermill | ||
shell: bash | ||
run: | | ||
IFS=',' read -r -a NOTEBOOK_ARRAY <<< "${{ inputs.notebook-input }}" | ||
# Loop through each notebook path | ||
for NOTEBOOK in "${NOTEBOOK_ARRAY[@]}"; do | ||
OUTPUT_FILE="${NOTEBOOK%.ipynb}_output.ipynb" | ||
echo "Running notebook: $NOTEBOOK" | ||
papermill "$NOTEBOOK" "$OUTPUT_FILE" --log-output --kernel python3 || { | ||
echo "Papermill failed for notebook: $NOTEBOOK" | ||
exit 1 | ||
} | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed with @saileshd1402 in the Training Operator PR: kubeflow/training-operator#2274 (comment), we might want to create script to run those Notebooks with papermill rather than adding the script in the GitHub action directly.
I think, once we finalize it, we can use the same approach for Katib tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andreyvelich for now I fix all the previous comments.
One of the notebooks is again seems to be rewritten (even though using jupyter lab) as you suggest.
Anyway I will fix those notebooks together with the decision about using the script or not.
Please keep me update on that.
"import time\n", | ||
"time.sleep(120)\n", | ||
"status = katib_client.is_experiment_succeeded(exp_name, namespace=namespace)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can use this API here.
"pycharm": { | ||
"name": "#%% md\n" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to edit these Notebooks using JupyterLab directly.
In that case, the JSON format will be correctly rendered for every IDE.
E.g. you can just run JupyterLab locally to edit them:
pip install jupyterlab
jupyter lab
/rerun-all |
Hi @yehudit1987, do you have time to finalize this PR ? |
Hi @andreyvelich, yes I have been waited for your approval regarding the script decisions as mentioned above. I will finalize this PR. |
2c0ce60
to
59af784
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Yehudit Kerido <[email protected]>
Signed-off-by: Yehudit Kerido <[email protected]>
70d149a
to
683608f
Compare
What this PR does / why we need it:
This PR creates E2E tests for katib examples to run with papermill.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2417
Checklist: