Hyperpipe

The Hyperpipe is the basic construct in PHOTON with which everything starts. It is like the designer for your machine learning pipeline. You choose your strategies, such as cross-validation-split methods, performance metrics, the hyperparameter optimization algorithm and so forth and then you add your pipeline elements.

class Hyperpipe

Wrapper class for machine learning pipeline, holding all pipeline elements and managing the optimization of the hyperparameters

Parameters

  • name [str]: Name of hyperpipe instance

  • inner_cv [BaseCrossValidator]: Cross validation strategy to test hyperparameter configurations, generates the validation set

  • outer_cv [BaseCrossValidator]: Cross validation strategy to use for the hyperparameter search itself, generates the test set

  • optimizer [str or object, default="grid_search"]: Hyperparameter optimization algorithm

    • In case a string literal is given:

      • "grid_search": optimizer that iteratively tests all possible hyperparameter combinations
      • "random_grid_search": a variation of the grid search optimization that randomly picks hyperparameter combinations from all possible hyperparameter combinations
      • "timeboxed_random_grid_search": randomly chooses hyperparameter combinations from the set of all possible hyperparameter combinations and tests until the given time limit is reached
      • limit_in_minutes: int
    • In case an object is given: expects the object to have the following methods:

    • next_config_generator: returns a hyperparameter configuration in form of an dictionary containing key->value pairs in the sklearn parameter encoding model_name__parameter_name: parameter_value
    • prepare: takes a list of pipeline elements and their particular hyperparameters to test
    • evaluate_recent_performance: gets a tested config and the respective performance in order to calculate a smart next configuration to process
  • metrics [list of metric names as str]: Metrics that should be calculated for both training, validation and test set Use the preimported metrics from sklearn and photonai, or register your own

    • Metrics for classification:
      • accuracy: sklearn.metrics.accuracy_score
      • matthews_corrcoef: sklearn.metrics.matthews_corrcoef
      • confusion_matrix: sklearn.metrics.confusion_matrix,
      • f1_score: sklearn.metrics.f1_score
      • hamming_loss: sklearn.metrics.hamming_loss
      • log_loss: sklearn.metrics.log_loss
      • precision: sklearn.metrics.precision_score
      • recall: sklearn.metrics.recall_score
    • Metrics for regression:
      • mean_squared_error: sklearn.metrics.mean_squared_error
      • mean_absolute_error: sklearn.metrics.mean_absolute_error
      • explained_variance: sklearn.metrics.explained_variance_score
      • r2: sklearn.metrics.r2_score
    • Other metrics
      • pearson_correlation: photon_core.framework.Metrics.pearson_correlation
      • variance_explained: photon_core.framework.Metrics.variance_explained_score
      • categorical_accuracy: photon_core.framework.Metrics.categorical_accuracy_score
  • best_config_metric [str]: The metric that should be maximized or minimized in order to choose the best hyperparameter configuration

  • eval_final_performance [bool, default=True]: If the metrics should be calculated for the test set, otherwise the test set is seperated but not used

  • test_size [float, default=0.2]: the amount of the data that should be left out if no outer_cv is given and eval_final_perfomance is set to True

  • set_random_seed [bool, default=False]: If True sets the random seed to 42

  • verbosity [int, default=0]: The level of verbosity, 0 is least talkative and gives only warn and error, 1 gives adds info and 2 adds debug

  • groups [array-like, default=None]: Info for advanced cross validation strategies, such as LeaveOneSiteOut-CV about the affiliation of the rows in the data

Attributes

  • optimum_pipe [Pipeline]: An sklearn pipeline object that is fitted to the training data according to the best hyperparameter configuration found. Currently, we don't create an ensemble of all best hyperparameter configs over all folds. We find the best config by comparing the test error across outer folds. The hyperparameter config of the best fold is used as the optimal model and is then trained on the complete set.

  • best_config [dict]: Dictionary containing the hyperparameters of the best configuration. Contains the parameters in the sklearn interface of model_name__parameter_name: parameter value

  • result_tree [MDBHyperpipe]: Object containing all information about the for the performed hyperparameter search. Holds the training and test metrics for all outer folds, inner folds and configurations, as well as additional information.

  • pipeline_elements [list]: Contains all PipelineElement or Hyperpipe objects that are added to the pipeline.

Example

manager = Hyperpipe('test_manager',
                    optimizer='timeboxed_random_grid_search', optimizer_params={'limit_in_minutes': 1},
                    outer_cv=ShuffleSplit(test_size=0.2, n_splits=1),
                    inner_cv=KFold(n_splits=10, shuffle=True),
                    metrics=['accuracy', 'precision', 'recall', "f1_score"],
                    best_config_metric='accuracy', eval_final_performance=True,
                    verbose=2)