Skip to content

Multiclass classification with ECOC (Error-Correcting Output Codes) and Label Switching. Implements LSEnsemble and evaluates performance using various metrics.

Notifications You must be signed in to change notification settings

franjgs/MC_Label_Switching

Repository files navigation

Multiclass Classification via Binary Decomposition and Asymmetric Label Switching Ensemble (ALSE)

This repository implements a multiclass classification framework using Binary Decomposition and the Asymmmetric Label Switching Ensemble (ALSE or LSEnsemble). ALSE is employed for the binary classification tasks generated by ECOC, effectively improving accuracy, especially with imbalanced data.

Project Overview

Binarization decompose multiclass problems into multiple binary classification problems. Even if the original multiclass data is balanced, binarization—such as the One-versus-Rest (OvR) strategy—can introduce severe class imbalance. This imbalance complicates learning, particularly in Bayes-optimal classification, where accurate likelihood ratio estimation is crucial.

The Asymetric Label Switching Ensemble (ALSE) offers an alternative approach to handle imbalanced classification. It randomly swaps labels based on predefined switching rates, introducing diversity within the ensemble, which enhances performance. Additionally, label switching rebalances class distributions, improving the estimation of transformed a priori probabilities and, consequently, the likelihood ratio for base learners.

Asymmetric label switching can be combined with other neutral rebalancing strategies, such as cost-sensitive learning and population-based adjustments. This results in a transformed problem where the optimal Bayes threshold can be theoretically derived.

For further details on likelihood ratio estimation and Bayes threshold transformations, see: "Optimum Bayesian thresholds for rebalanced classification problems using class-switching ensembles," Pattern Recognition, 2022. https://doi.org/10.1016/j.patcog.2022.109158

Key Features

  • Hyperparameter Optimization Script (optimize_hyperparameters.py):
    • This script performs the crucial task of optimizing the hyperparameters for the classification models.
    • It systematically explores different configurations of model parameters.
    • It evaluates the performance of each configuration using cross-validation or a similar technique.
    • The script saves the best-performing model configurations (parameters) for later use in the evaluation phase.
    • It is driven by the configurations specified in the config.yaml file, including the models to optimize and the ranges of their hyperparameters.
  • Binarization Decomposition:
    • Implements binarization matrices for effective multiclass to binary transformation.
    • Offers flexible One-versus-One (OvO), OvR, and ECOC encoding options (complete, dense, sparse) configurable via config.yaml.
  • ALS Ensemble Algorithm:
    • A specialized ensemble algorithm designed for label switching correction, enhancing model robustness.
    • Provides configurable optimization parameters for the ALS Ensemble via config.yaml.
  • Comprehensive Evaluation:
    • Includes a thorough evaluation suite using key multiclass metrics: balanced accuracy, Cohen's kappa, geometric mean, and sensitivity.
    • Features detailed logging and output to facilitate performance analysis.
  • Configuration-Driven Design:
    • Leverages config.yaml for easy customization of datasets, models, and evaluation settings.
    • Employs streamlined model selection logic based on peak or average performance, configurable through config.yaml.
  • Model Persistence:
    • Saves models configured with optimized hyperparameters, along with their configurations, using pickle for efficient testing and deployment.
    • Utilizes a naming convention for saved models that includes the parameters used for the ALS Ensemble.
  • Performance Evaluation Script (evaluate_performance.py):
    • This script orchestrates the evaluation of models using pre-optimized hyperparameters.
    • It loads the best model configurations (parameters) saved during the optimization phase by optimize_hyperparameters.py.
    • Subsequently, it instantiates and trains the models using these optimal configurations.
    • The script concludes by performing testing on a separate dataset to obtain the final performance metrics.
    • It relies on the config.yaml file to load necessary parameters and instantiate the model architecture based on the stored configurations.
  • Dataset Flexibility:
    • Engineered to accommodate a wide range of datasets stored within a designated data folder.

Getting Started

  1. Clone the repository:

    git clone https://github.com/franjgs/MC_Label_Switching.git
    cd MC_Label_Switching
  2. Configure config.yaml:

    • Open the config.yaml file and adjust settings for datasets, models, optimization parameters, evaluation metrics, and other project configurations as needed.
  3. Run the hyperparameter optimization script:

    python optimize_hyperparameters.py
    • This script will explore different hyperparameter combinations and save the best configurations in output_folder.
  4. Run the performance evaluation script:

    python evaluate_performance.py
    • This script will load the saved optimal hyperparameters, train the models with these configurations, and evaluate their performance on the test datasets. The results will be saved in output_folder.

Dependencies

  • Python 3.x
  • imbalanced-learn == 0.13.0
  • Keras == 2.15.0
  • NumPy == 1.26.4
  • Pandas == 2.2.3
  • PyYAML == 6.0.2
  • scikit-learn == 1.5.2
  • SciPy == 1.15.1
  • Torch == 2.6.0
  • TorchEnsemble == 0.2.0

Datasets

  • Place your datasets in the datasets folder (or specify the path in config.yaml).

Output

  • Trained models and evaluation metrics are saved in the results folder.

Contributing

  • Contributions are welcome! Please feel free to submit pull requests or open issues.

License

  • Copyright (c) 2025 UC3M

About

Multiclass classification with ECOC (Error-Correcting Output Codes) and Label Switching. Implements LSEnsemble and evaluates performance using various metrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages