Skip to content

New perf. metrics, stability and other improvements #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Alexsandruss
Copy link
Contributor

@Alexsandruss Alexsandruss commented Apr 28, 2025

Description

Changes:

  • Add support of LightGBM daal4py modelbuilders
  • Add garbage collection and result cleaning in datasets prefetching function to avoid out-of-memory errors
  • Add SKLBENCH_DATA_CACHE env variable as the first default location for datasets cache for convenience ($PWD/data_cache is still working if env variable is not set)
  • Change default dtype to float32
  • Adjust compatibility mode of report generator to work with latest versions of stock sklearn and RAPIDS and for other cases
  • Update collected performance metrics:
    • Add cost metrics counted in microdollars (most readable degree for usual case computation time)
    • Add CPU load profiling
    • Add RAM and VRAM usage profiling
    • Add coefficient of variation for time
    • Add 1st run time
    • Add 1st-mean run ratio
  • Change color scale from RED-YELLOW-GREEN to RED-WHITE-GREEN in perf. report for better readability
  • Add option for cache flushing between case runs
  • Docs:
    • Mention Kaggle dataset download requirements
    • Note about experimental configs content and meaning
    • Move Benchmarking Config Specification to separate file
    • Add benchmarking scopes short explanation

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@Alexsandruss Alexsandruss added enhancement New feature or request docs documentation and readme update labels Apr 28, 2025
@Alexsandruss Alexsandruss mentioned this pull request Apr 28, 2025
9 tasks
@Alexsandruss Alexsandruss changed the title Updates and fixes New perf. metrics, stability improvements and other fixes Apr 28, 2025
@Alexsandruss Alexsandruss marked this pull request as ready for review April 28, 2025 11:45
@Alexsandruss Alexsandruss changed the title New perf. metrics, stability improvements and other fixes New perf. metrics, stability and other improvements Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs documentation and readme update enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant