-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve test coverage in the catboost integration tests #18126
Improve test coverage in the catboost integration tests #18126
Conversation
@jameslamb you had some thoughts on the last PR, any interest in taking another stab at reviewing this one ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the @
, and for doing this follow-up PR!
I left a few more suggestions for your consideration. Leaving a non-blocking review... they're all things that shouldn't block a merge and that you can choose whether or not to do based on how much time you want to invest in this.
python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_catboost.py
Outdated
Show resolved
Hide resolved
"categorical_feature": rng.choice(["A", "B", "C"], size=100), | ||
"target": rng.integers(0, 2, size=100), | ||
"fea_1": rng.standard_normal(1000), | ||
"fea_2": rng.choice([0, 1, 2, 3], size=1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choosing these values randomly like this is a risk. The categorical features may be noise to catboost
, meaning they wouldn't be chosen for any splits. And if they aren't chosen for any splits, then this test wouldn't be able to catch bugs in encoding/decoding that lead to the wrong representation (because predict()
would effectively ignore the values in these columns).
It's for concerns like this that the xgboost
example code I shared tries to create categorical features that have some relationship to the target: https://github.com/dmlc/xgboost/blob/105aa4247abb3ce787be2cef2f9beb4c24b30049/demo/guide-python/categorical.py#L40-L45
For these tests, I think the easiest way to do that would be something like this:
- use
sklearn.datasets.make_classification()
to generate the initial dataset (features there are not totally random... they have some relationship to the target) - to get an
int
categorical feature, replace one of those continuous columns with a discretized version (say, split it into deciles withnp.quantile()
andnp.digitize()
or something)... e.g. values in the lowest 10% coded a0
, next 10-20% as1
, etc. - to get a
category
feature, repeat that procedure (but with a different feature's values, and using a different cardinality), but instead of ints use strings like"low"
,"medium"
,"high"
(or whatever you want) and convert tocategory
dtype
And then in the test, assert that all the features were used at least once. I don't know the exact API for this in catboost
... in xgboost
/ lightgbm
it is a property on the estimator called .feature_importances_
, like in this example: https://stackoverflow.com/a/79435055/3986677
I know this seems like a lot, but I think it's important. When given a pandas
-like input to .predict()
, catboost
will try to do the same types of transformation as it did on the training data. There's potential for bugs in this integration there, that'd only be caught at predict()
time by ensuring the model actually uses the categorical features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to get a category feature, repeat that procedure (but with a different feature's values, and using a different cardinality), but instead of ints use strings like "low", "medium", "high" (or whatever you want) and convert to category dtype
I was unable to add this feature. When doing so I ran into this closed catboost issue catboost/catboost#934 which in IMO should still be open since several other have run into it.
python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_catboost.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with catboost, overall, it would be useful to have tests that check the consistency of the predict function as James suggested. For instance, what happens if the test dataset has lesser categories. What happens if there's null values. What happens if the test dataset has wrong categories, etc. Just to make sure catboost is producing the same results as pandas inputs.
There appears to be a conda solve conflict between `cuspatial` & `custreamz`/`libcudf_kafka`. I think it ultimately stems from `lz4-c` pinnings. `cuspatial`'s dependencies require `lz4-c >=1.10.0,<1.11.0a0`, but `libcudf_kafka`'s `librdkafka >=2.5.0,<2.6.0a0` requires `lz4-c >=1.9.3,<1.10.0a0`. Bumping `librdkafka` to `>=2.8.0` should resolve this conflict.
…#18337) Update `test_narwhals.sh` to install pytest-env and hypothesis In Narwhals these are pulled in by using `--group test`, but that requires uv (which cuDF doesn't use) Demo of this working: https://www.kaggle.com/code/marcogorelli/narwhals-pre-release-test-cudf/notebook?scriptVersionId=228712396 --------- Co-authored-by: Edoardo Abati <[email protected]> Co-authored-by: Vyas Ramasubramani <[email protected]>
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a solid improvement. I'll trust the other reviewers expertise in judging if this is a sufficient test for what we need to verify works with catboost.
/merge |
a3d717e
into
rapidsai:branch-25.06
Description
Follow up of #17931. This PR tests catboost with categorical features and increases the size of the the test data we fit our catboost models on.
Checklist