Hyperpipe

The PHOTONAI Hyperpipe class enables you to create a custom pipeline. In addition it defines the relevant analysis’ parameters such as the cross-validation scheme, the hyperparameter optimization strategy, and the performance metrics of interest.

So called PHOTONAI PipelineElements can be added to the Hyperpipe, each of them being a data-processing method or a learning algorithm. By choosing and combining data-processing methods and algorithms and arranging them with the PHOTONAI classes, both simple and complex pipeline architectures can be designed rapidly.

The PHOTONAI Hyperpipe automatizes the nested training, test and hyperparameter optimization procedures.

  • The Hyperpipe monitors the nested-cross-validated training and test procedure,
  • communicates with the hyperparameter optimization strategy,
  • streams information between the pipeline elements,
  • logs all results obtained and evaluates the performance.
  • guides the hyperparameter optimization process by a so-called best config metric which is used to select the best performing hyperparameter configuration.

Example

           
my_pipe = Hyperpipe(name='basic_svm_pipe_no_performance',
                    optimizer='sk_opt',
                    optimizer_params={'n_configurations': 25},
                    metrics=['mean_squared_error', 'pearson_correlation'],
                    best_config_metric='mean_squared_error',
                    outer_cv=KFold(n_splits=3, shuffle=True),
                    inner_cv=KFold(n_splits=3),
                    eval_final_performance=True,
                    output_settings=OutputSettings(project_folder='./result_folder',
                                                   mongodb_connect_url="mongodb://localhost:27017/photon_results",
                                                   save_output=True,
                                                   plots=True),
                    cache_folder='./tmp',
                    verbosity=1,
                    random_seed=42,
                    performance_constraints=[MinimumPerformance('mean_squared_error', 35, 'first'),
                                             MinimumPerformance('pearson_correlation', 0.7, 'all')])
           
    

Parameters

Parameter Type Description
name str Name of object instance
inner_cv BaseCrossValidator Cross validation strategy to test hyperparameter configurations, generates the validation set
outer_cv BaseCrossValidator Cross validation strategy to use for the hyperparameter search itself, generates the test set
eval_final_performance bool, default=True If the metrics should be calculated for the test set, otherwise the test set is seperated but not used
optimizer str or object, default="grid_search"
  • grid_search
  • random_grid_search
  • timeboxed_random_grid_search
  • sk_opt
  • smac
metrics [list of metric names as str]
  • accuracy sklearn.metrics.accuracy_score
  • matthews_corrcoef sklearn.metrics.matthews_corrcoef
  • confusion_matrix sklearn.metrics.confusion_matrix
  • f1_score sklearn.metrics.f1_score
  • hamming_loss sklearn.metrics.hamming_loss
  • log_loss sklearn.metrics.log_loss
  • precision sklearn.metrics.precision_score
  • recall sklearn.metrics.recall_score
  • mean_squared_error sklearn.metrics.mean_squared_error
  • mean_absolute_error sklearn.metrics.mean_absolute_error
  • explained_variance sklearn.metrics.explained_variance_score
  • r2 sklearn.metrics.r2_score
  • pearson_correlation photonai.processing.metrics.pearson_correlation
  • variance_explained photonai.processing.metrics.variance_explained_score
  • categorical_accuracy photonai.processing.metrics.categorical_accuracy_score
best_config_metric str he metric that should be maximized or minimized. It is used in order to choose the best hyperparameter configuration
output_settings OutputSettings Object Define persistation parameters such as
verbosity int Level of output quantity:
0: only photonai system logs
1: normal photonai output
2: extensive photonai output
random_seed int control random seed, which is passed to each both numpy and each PipelineElement
performance_constraints [list of threshold objects] In order to save computational resources and time, the evaluation of a hyperparameter configuration can be abbreviated. In case a configuration performs worse than an expected minimum, all further inner cross validation folds are skipped. This can be a fixed threshold given by the user or a dynamic threshold, that expects the configuration to be a certain percentage better than the dummy estimator's performance.
  • MinimumPerformance('metric_name', expected_minimum)
  • DummyPerformance('metric_name', expected_offset)
cache_folder str URL/Path to a directory in which PHOTONAI is allowed to write cache files. If the paramater is given, PHOTONAI caches fold- and config-wise.

Attributes

Parameter Type Description
elements [list of PipelineElement objects] the pipeline defining sequence of preprocessing methods and algorithms
best_config dict Dictionary containing the hyperparameters of the best configuration. Contains the parameters in the sklearn interface of model_name__parameter_name: parameter value
results MDBHyperpipe Object containing all information about the for the performed hyperparameter search. Holds the training and test metrics for all outer folds, inner folds and configurations, as well as additional information.
optimum_pipe PhotonPipeline A pipeline object fitted with the complete feature matrix according to the best hyperparameter configuration found.