Imbalanced Data Transform

We have a simple solution for data with imbalanced class structure in a classification setting. PHOTONs approach is based on imblearn and covers over-, under- and combinesampling. You can choose and hyperparameter optimize any function implemented in imblearn. Have a look at the Developer Website for functional detail of the balancing data algorithms

           
# imbalance_type = OVERSAMPLING:
#     - RandomOverSampler
#     - SMOTE
#     - ADASYN
#
# imbalance_type = UNDERSAMPLING:
#     - ClusterCentroids,
#     - RandomUnderSampler,
#     - NearMiss,
#     - InstanceHardnessThreshold,
#     - CondensedNearestNeighbour,
#     - EditedNearestNeighbours,
#     - RepeatedEditedNearestNeighbours,
#     - AllKNN,
#     - NeighbourhoodCleaningRule,
#     - OneSidedSelection
#
# imbalance_type = COMBINE:
#     - SMOTEENN,
#     - SMOTETomek

from photonai.base.PhotonBase import Hyperpipe, PipelineElement, OutputSettings
from photonai.optimization.Hyperparameters import FloatRange, Categorical
from photonai.configuration.Register import PhotonRegister
from sklearn.model_selection import KFold
from imblearn.datasets import fetch_datasets

# example of imbalanced dataset
dataset = fetch_datasets()['coil_2000']
X, y = dataset.data, dataset.target # ratio class 0: 0.06%, class 1: 0.94%

# DESIGN YOUR PIPELINE
# here we use best_config_metic = 'f1_score' cause it is the stable choice for imbalanced data
my_pipe = Hyperpipe('basic_imbalanced_pipe',
            optimizer='grid_search',
            metrics=['accuracy', 'precision', 'recall'],
            best_config_metric='f1_score',
            outer_cv=KFold(n_splits=3),  # repeat hyperparameter search three times
            inner_cv=KFold(n_splits=5),  # test each configuration five times respectively,
            verbosity=1)


# NOW FIND OUT MORE ABOUT THE ImbalancedDataTransform ELEMENT
PhotonRegister.info('ImbalancedDataTransform')

# ADD ELEMENTS TO YOUR PIPELINE
# first normalize all features
my_pipe += PipelineElement('StandardScaler')

# rebalance data
my_pipe += PipelineElement('ImbalancedDataTransform', {'method_name': ['RandomUnderSampler', 'SMOTE']},test_disabled=True)

# engage and optimize the good old SVM for Classification
my_pipe += PipelineElement('SVC', hyperparameters={'kernel': Categorical(['rbf','linear']),
                                                                           'C': FloatRange(0.5, 2)},gamma="auto")

# NOW TRAIN YOUR PIPELINE
my_pipe.fit(X, y)