Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable numeric and ordinal metrics for ordinal outcome models #7

Open
corybrunson opened this issue Nov 4, 2024 · 1 comment
Open

Comments

@corybrunson
Copy link

Recently, Sakai (2021) compared several class, numeric, and proposed "ordinal" performance measures/metrics on ordinal classification tasks. This raises the questions of (1) what performance measures {yardstick} should make available for ordinal classification models and (2) how to harmonize this decision with package conventions. I don't know what challenges (2) would pose, and anyway they will depend on (1).

I think it's necessary to make measures available that are specifically designed for ordinal classification, in part because there are serious, though separate, theoretical problems with using class and numeric measures. That said, i think there are also good reasons to make both class and numeric measures available:

  1. Commensurability: Compare results to previous work that used class or numeric measures.
  2. Benchmarking: Measure the comparative advantage of using ordinal measures.
  3. Model selection: Assess whether a nominally ordinal outcome can be treated as categorical or integer-valued (for reasons, e.g., of tractability or interpretation).

Because metric_set() (understandably) refuses to mix numeric and class measures, perhaps this would be best achieved by allowing ordinal_reg() and (its and other) ordinal engines to also play in 'regression' mode, while the specifically ordinal measures could require (else error) or expect (else warning) that the outcome is ordered, that the model type or engine is ordinal, or that some other check is passed.

This would unavoidably enable bad practice, but it's bound to come up, and i think it deserves consideration.

@topepo
Copy link
Collaborator

topepo commented Nov 8, 2024

These should all be in yardstick. I've made an issue for ranked probably scores, which I favor.

I've read the Sakai paper(s), and they seem to think that probabilistic predictions do not exist.

TBH, everything else that I've seen is problematic in a variety of ways. MSE/MAE/RMSE based on predicted class "distances" are things that we can estimate, but I would not want to rely on them. If we use a class-based metric, I would choose Kappa or alpha or one of the others that have been studied and vetted for decades.

A lot of the metrics I see in the CS papers seem poorly motivated, and I get the sense that they've never looked into the massive amounts of prior art on the subject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants