Getting Started

PHOTON is a high level python API for designing and optimizing machine learning pipelines.

We developed a framework which pre-structures and automatizes the repetitive part of the model development process so that the user can focus on the important design decisions regarding pipeline architecture and the choice of parameters.

PHOTON is designed to give any user easy access to state-of-the-art machine learning and integrates the power of various machine learning toolboxes and algorithms.

Installation

You only need two things: Python 3 and your favourite Python IDE to get started. Then simply install PHOTON via pip.

           
pip install photonai
           

Example classification pipeline

How to setup a basic classification pipeline including normalization, dimensionality reduction and a support vector machine as implemented in scikit-learn. We optimize the hyperparameters using scikit-optimize in as nested K-Fold cross-validation schema and measure the performance with accuracy, precision, recall and balanced_accuracy. The best configuration is picked by maximizing the value for accuracy.

           
imports ...

# DESIGN YOUR PIPELINE
my_pipe = Hyperpipe('basic_svm_pipe',  # the name of your pipeline
                    # which optimizer PHOTON shall use
                    optimizer='sk_opt',
                    optimizer_params={'n_configurations': 10},
                    # the performance metrics of your interest
                    metrics=['accuracy', 'precision', 'recall', 'balanced_accuracy'],
                    # after hyperparameter optimization, this metric declares the winner config
                    best_config_metric='accuracy',
                    # repeat hyperparameter optimization three times
                    outer_cv=KFold(n_splits=3),
                    # test each configuration five times respectively,
                    inner_cv=KFold(n_splits=5),
                    verbosity=1,
                    output_settings=OutputSettings(project_folder='./tmp/'))


# first normalize all features
my_pipe.add(PipelineElement('StandardScaler'))

# then do feature selection using a PCA
my_pipe += PipelineElement('PCA', hyperparameters={'n_components': IntegerRange(5, 20)}, test_disabled=True)

# engage and optimize the good old SVM for Classification
my_pipe += PipelineElement('SVC', hyperparameters={'kernel': Categorical(['rbf', 'linear']),
                                                   'C': FloatRange(0.5, 2)}, gamma='scale')

# train pipeline
X, y = load_breast_cancer(True)
my_pipe.fit(X, y)

# visualize results
Investigator.show(my_pipe)

            
     

PHOTON Contributions

Automatized Train & Test Workflow

PHOTON provides prestructured and automatized training and test procedure integrating nested cross-validation and hyperparameter optimization, so that the focus remains on thinking about model development design decisions.

We consider the cross-validation scheme, the hyperparameter optimization strategy, and the performance metrics as parameters that can be selected from a library of choices.

see Hyperpipe

Select and Combine Algorithms.

By treating each pipeline element as a building block, we create a system in which the user can select and combine processing steps, adapt their arrangement or stack them in more advanced pipeline layouts.

Furthermore, PHOTON out-of-the-box provides access to manifold algorithms from diverse machine learning python toolboxes, that can be used without the need to acquire toolbox specific syntax skills.

see PipelineElement

Advanced Pipeline Functionalities

The PHOTON pipeline is able to handle parallel data streams encapsulated in AND- and OR-Elements and has the possibility to branch of several parallel sub-pipelines each containing a sequence of data transformations.

see Stack, Switch, and Branch

In addition advanced methods such as data augmentation techniques, not only transforming the feature matrix but also manipulating the target vector, can easily be integrated at any given position in the pipeline. Finally, supplementary data, i.e. information relevant for a (pre-) processing algorithm that is not included in the feature matrix, can conveniently be accessed and is automatically matched to the cross-validation split.

see Imbalanced Data and Confounder Removal

Fully Customizable and Extensible

a fully customizable and extensible framework that is compatible to current state-of-the-art toolboxes. By implementing the field’s de-facto standard interface introduced by scikit-learn, any custom code can seamlessly integrate with all PHOTON functionalities.

see Custom Transformer and Custom Estimator

Saving and Distributing Models

PHOTON offers a standardized format for saving, loading and distributing optimized and fully trained pipeline architectures with only one line of code.

see .PHOTON format