Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SVMSMOTE sample_strategy not equalizing samples of different classes #1130

Open
coughlin-devin opened this issue Mar 13, 2025 · 0 comments

Comments

@coughlin-devin
Copy link

coughlin-devin commented Mar 13, 2025

Describe the bug

When using SVMSMOTE the resampled data is not equal among the different classes when sampling_strategy is set to 'not majority' or when passing a dict of values.

Steps/Code to Reproduce

import numpy as np
from sklearn.svm import SVC
from imblearn.over_sampling import SVMSMOTE
from collections import Counter

# create data
x = np.random.normal(0, 0.5, 1000)
y = np.random.normal(0, 0.5, 1000)
clss = np.minimum(np.random.geometric(0.5, 1000), 7)

# check original class distribution
Counter(clss)
num_majority = Counter(clss).get(1)

arr = np.array((x,y)).T
svc = SVC(C=10, kernel='rbf', gamma='scale', class_weight='balanced', random_state=2024)
svc.fit(arr, clss)
not_majority = SVMSMOTE(sampling_strategy='not majority', k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)

# check resampled class distribution with 'not majority' sample_strategy
a, b = not_majority.fit_resample(arr, clss)
Counter(b)

sampling_strategy = sampling_strategy = {1:num_majority, 2:num_majority, 3:num_majority, 4:num_majority, 5:num_majority, 6:num_majority, 7:num_majority}
dict_strat = SVMSMOTE(sampling_strategy=sampling_strategy, k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)

# check resampled class distribution with dictionary sample_strategy
c, d = dict_strat.fit_resample(arr, clss)
Counter(d)

Expected Results

I would expect the resampled classes to all have the same number of samples like below:

Counter(b)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})

Counter(d)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})

Actual Results

But after resampling the minority classes have fewer samples than the majority class.

Counter(b)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})

Counter(d)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})

Versions

System:
    python: 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
executable: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\python.exe
   machine: Windows-10-10.0.26100-SP0

Python dependencies:
      sklearn: 1.5.1
          pip: 24.0
   setuptools: 65.6.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.8.0
       joblib: 1.3.1
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\numpy.libs\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\scipy.libs\libopenblas_v0.3.27--3aa239bc726cfb0bd8e5330d8d4c15c6.dll
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiomp5md.dll
        version: None
    num_threads: 6

       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiompstubs5md.dll
        version: None
    num_threads: 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant