Pipeline Switch

If you cannot decide which transformer or estimator object you want to have at a particular point in your pipeline, the PipelineSwitch is for you. Add your items and PHOTON interchanges them to let you know which one is the best. But be careful, it currently only works with the grid search optimizer.

See an example

class PipelineSwitch

This class encapsulates several pipeline elements that belong at the same step of the pipeline, competing for being the best choice.

If for example you want to find out if preprocessing A or preprocessing B is better at this position in the pipe. Or you want to test if a tree outperforms the good old SVM.

ATTENTION: This class is a construct that may be convenient but is not suitable for any complex optimizations. Currently it only works for grid_search and the derived optimization strategies. USE THIS ONLY FOR RAPID PROTOTYPING AND PRELIMINARY RESULTS

The class acts as if it is a single entity. Tt joins the hyperparamater combinations of each encapsulated element to a single, big combination grid. Each hyperparameter combination from that grid gets a number. Then the PipelineSwitch object publishes the numbers to be chosen as the object's hyperparameter. When a new number is chosen from the optimizer, it internally activates the belonging element and sets the element's parameter to the hyperparameter combination. In that way, each of the elements is tested in all its configurations at the same position in the pipeline. From the outside, the process and the optimizer only sees one parameter of the PipelineSwitch, that is the an integer indicating which item of the hyperparameter combination grid is currently active.

class PipelineSwitch(PipelineElement):
    """
    This class encapsulates several pipeline elements that belong at the same step of the pipeline,
    competing for being the best choice.

    If for example you want to find out if preprocessing A or preprocessing B is better at this position in the pipe.
    Or you want to test if a tree outperforms the good old SVM.

    ATTENTION: This class is a construct that may be convenient but is not suitable for any complex optimizations.
    Currently it only works for grid_search and the derived optimization strategies.
    USE THIS ONLY FOR RAPID PROTOTYPING AND PRELIMINARY RESULTS

    The class acts as if it is a single entity. Tt joins the hyperparamater combinations of each encapsulated element to
    a single, big combination grid. Each hyperparameter combination from that grid gets a number. Then the PipelineSwitch
    object publishes the numbers to be chosen as the object's hyperparameter. When a new number is chosen from the
    optimizer, it internally activates the belonging element and sets the element's parameter to the hyperparameter
    combination. In that way, each of the elements is tested in all its configurations at the same position in the
    pipeline. From the outside, the process and the optimizer only sees one parameter of the PipelineSwitch, that is
    the an integer indicating which item of the hyperparameter combination grid is currently active.

    """

    def __init__(self, name: str, pipeline_element_list: list = None, _estimator_type='regressor'):
        """
        Creates a new PipelineSwitch object and generated the hyperparameter combination grid

        Parameters
        ----------
        * `name` [str]:
            How the element is called in the pipeline
        * `pipeline_element_list` [list, optional]:
            The competing pipeline elements
        * `_estimator_type:
            Used for validation purposes, either classifier or regressor

        """
        self.name = name
        self.sklearn_name = self.name + "__current_element"
        self._hyperparameters = {}
        self._current_element = (1, 1)
        self.disabled = False
        self.test_disabled = False
        self.pipeline_element_configurations = []
        self._estimator_type = _estimator_type

        if pipeline_element_list:
            self.pipeline_element_list = pipeline_element_list
            self.generate_private_config_grid()
        else:
            self.pipeline_element_list = []

    def __iadd__(self, pipeline_element):
        """
        Add a new estimator or transformer object to the switch container. All items change positions during testing.

        Parameters
        ----------
        * `pipeline_element` [PipelineElement]:
            Item that should be tested against other competing elements at that position in the pipeline.
        """
        self.pipeline_element_list.append(pipeline_element)
        self.generate_private_config_grid()
        return self

    def add(self, pipeline_element):
        """
        Add a new estimator or transformer object to the switch container. All items change positions during testing.

        Parameters
        ----------
        * `pipeline_element` [PipelineElement]:
            Item that should be tested against other competing elements at that position in the pipeline.
        """
        self.__iadd__(pipeline_element)

    @property
    def hyperparameters(self):
        # Todo: return actual hyperparameters of all pipeline elements??
        return self._hyperparameters

    @hyperparameters.setter
    def hyperparameters(self, value):
        pass

    def generate_private_config_grid(self):
        # reset
        self.pipeline_element_configurations = []

        # calculate anew
        hyperparameters = []
        # generate possible combinations for each item respectively - do not mix hyperparameters across items
        for i, pipe_element in enumerate(self.pipeline_element_list):
            # distinct_values_config = create_global_config([pipe_element])
            # add pipeline switch name in the config so that the hyperparameters can be set from other classes
            # pipeline switch will give the hyperparameters to the respective child
            # distinct_values_config_copy = {}
            # for config_key, config_value in distinct_values_config.items():
            #     distinct_values_config_copy[self.name + "__" + config_key] = config_value

            element_configurations = pipe_element.generate_config_grid()
            final_configuration_list = []
            for dict_item in element_configurations:
                copy_of_dict_item = {}
                for key, value in dict_item.items():
                    copy_of_dict_item[self.name + '__' + key] = value
                final_configuration_list.append(copy_of_dict_item)

            self.pipeline_element_configurations.append(final_configuration_list)
            hyperparameters += [(i, nr) for nr in range(len(final_configuration_list))]

        self._hyperparameters = {self.sklearn_name: hyperparameters}

    @property
    def current_element(self):
        return self._current_element

    @current_element.setter
    def current_element(self, value):
        self._current_element = value
        # pass the right config to the element
        # config = self.pipeline_element_configurations[value[0]][value[1]]
        # self.base_element.set_params(config)

    @property
    def base_element(self):
        """
        Returns the currently active element
        """
        obj = self.pipeline_element_list[self.current_element[0]]
        return obj

    def set_params(self, **kwargs):

        """
        The optimization process sees the amount of possible combinations and chooses one of them.
        Then this class activates the belonging element and prepared the element with the particular chosen configuration.

        """

        config_nr = None
        if self.sklearn_name in kwargs:
            config_nr = kwargs[self._sklearn_curr_element]
        elif 'current_element' in kwargs:
            config_nr = kwargs['current_element']

        if config_nr is None or not isinstance(config_nr, (tuple, list)):
            Logger().error('ValueError: current_element must be of type Tuple')
            raise ValueError('current_element must be of type Tuple')
        else:
            self.current_element = config_nr
            config = self.pipeline_element_configurations[config_nr[0]][config_nr[1]]
            # remove name
            unnamed_config = {}
            for config_key, config_value in config.items():
                key_split = config_key.split('__')
                unnamed_config['__'.join(key_split[2::])] = config_value
            self.base_element.set_params(**unnamed_config)
        return self

    def prettify_config_output(self, config_name, config_value, return_dict=False):

        """
        Makes the sklearn configuration dictionary human readable

        Returns
        -------
        * `prettified_configuration_string` [str]:
            configuration as prettified string or configuration as dict with prettified keys
        """

        if isinstance(config_value, tuple):
            output = self.pipeline_element_configurations[config_value[0]][config_value[1]]
            if not output:
                if return_dict:
                    return {self.pipeline_element_list[config_value[0]].name:None}
                else:
                    return self.pipeline_element_list[config_value[0]].name
            else:
                if return_dict:
                    return output
                return str(output)
        else:
            return super(PipelineSwitch, self).prettify_config_output(config_name, config_value)

Ancestors (in MRO)

Class variables

var ELEMENT_DICTIONARY

Static methods

def __init__(self, name, pipeline_element_list=None, _estimator_type='regressor')

Creates a new PipelineSwitch object and generated the hyperparameter combination grid

Parameters

  • name [str]: How the element is called in the pipeline
  • pipeline_element_list [list, optional]: The competing pipeline elements
  • `_estimator_type: Used for validation purposes, either classifier or regressor
def __init__(self, name: str, pipeline_element_list: list = None, _estimator_type='regressor'):
    """
    Creates a new PipelineSwitch object and generated the hyperparameter combination grid
    Parameters
    ----------
    * `name` [str]:
        How the element is called in the pipeline
    * `pipeline_element_list` [list, optional]:
        The competing pipeline elements
    * `_estimator_type:
        Used for validation purposes, either classifier or regressor
    """
    self.name = name
    self.sklearn_name = self.name + "__current_element"
    self._hyperparameters = {}
    self._current_element = (1, 1)
    self.disabled = False
    self.test_disabled = False
    self.pipeline_element_configurations = []
    self._estimator_type = _estimator_type
    if pipeline_element_list:
        self.pipeline_element_list = pipeline_element_list
        self.generate_private_config_grid()
    else:
        self.pipeline_element_list = []

def add(self, pipeline_element)

Add a new estimator or transformer object to the switch container. All items change positions during testing.

Parameters

  • pipeline_element [PipelineElement]: Item that should be tested against other competing elements at that position in the pipeline.
def add(self, pipeline_element):
    """
    Add a new estimator or transformer object to the switch container. All items change positions during testing.
    Parameters
    ----------
    * `pipeline_element` [PipelineElement]:
        Item that should be tested against other competing elements at that position in the pipeline.
    """
    self.__iadd__(pipeline_element)

def copy_me(self)

def copy_me(self):
    return deepcopy(self)

def fit(self, data, targets=None)

Calls the fit function of the base element

Returns

self

def fit(self, data, targets=None):
    """
    Calls the fit function of the base element
    Returns
    ------
    self
    """
    if not self.disabled:
        obj = self.base_element
        obj.fit(data, targets)
        # self.base_element.fit(data, targets)
    return self

def generate_config_grid(self)

def generate_config_grid(self):
    config_dict = create_global_config_dict([self])
    if len(config_dict) > 0:
        if self.test_disabled:
            config_dict.pop(self._sklearn_disabled)
        config_list = list(ParameterGrid(config_dict))
        if self.test_disabled:
            config_list.append({self._sklearn_disabled: True})
        return config_list
    else:
        return []

def generate_private_config_grid(self)

def generate_private_config_grid(self):
    # reset
    self.pipeline_element_configurations = []
    # calculate anew
    hyperparameters = []
    # generate possible combinations for each item respectively - do not mix hyperparameters across items
    for i, pipe_element in enumerate(self.pipeline_element_list):
        # distinct_values_config = create_global_config([pipe_element])
        # add pipeline switch name in the config so that the hyperparameters can be set from other classes
        # pipeline switch will give the hyperparameters to the respective child
        # distinct_values_config_copy = {}
        # for config_key, config_value in distinct_values_config.items():
        #     distinct_values_config_copy[self.name + "__" + config_key] = config_value
        element_configurations = pipe_element.generate_config_grid()
        final_configuration_list = []
        for dict_item in element_configurations:
            copy_of_dict_item = {}
            for key, value in dict_item.items():
                copy_of_dict_item[self.name + '__' + key] = value
            final_configuration_list.append(copy_of_dict_item)
        self.pipeline_element_configurations.append(final_configuration_list)
        hyperparameters += [(i, nr) for nr in range(len(final_configuration_list))]
    self._hyperparameters = {self.sklearn_name: hyperparameters}

def generate_sklearn_hyperparameters(self, value)

Generates a dictionary according to the sklearn convention of element_name__parameter_name: parameter_value

def generate_sklearn_hyperparameters(self, value: dict):
    """
    Generates a dictionary according to the sklearn convention of element_name__parameter_name: parameter_value
    """
    self._hyperparameters = {}
    for attribute, value_list in value.items():
        self._hyperparameters[self.name + '__' + attribute] = value_list
    if self.test_disabled:
        self._hyperparameters[self._sklearn_disabled] = [False, True]

def get_params(self, deep=True)

Forwards the get_params request to the wrapped base element

def get_params(self, deep: bool=True):
    """
    Forwards the get_params request to the wrapped base element
    """
    return self.base_element.get_params(deep)

def inverse_transform(self, data)

Calls inverse_transform on the base element

def inverse_transform(self, data):
    """
    Calls inverse_transform on the base element
    """
    if hasattr(self.base_element, 'inverse_transform'):
        return self.base_element.inverse_transform(data)
    else:
        # raise Warning('Element ' + self.name + ' has no method inverse_transform')
        return data

def predict(self, data)

Calls predict function on the base element.

IF PREDICT IS NOT AVAILABLE CALLS TRANSFORM. This is for the case that the encapsulated hyperpipe only part of another hyperpipe, and works as a transformer. Sklearn usually expects the last element to predict. Also this is needed in case we are using an autoencoder which is firstly trained by using predict, and after training only used for transforming.

def predict(self, data):
    """
    Calls predict function on the base element.
    IF PREDICT IS NOT AVAILABLE CALLS TRANSFORM.
    This is for the case that the encapsulated hyperpipe only part of another hyperpipe, and works as a transformer.
    Sklearn usually expects the last element to predict.
    Also this is needed in case we are using an autoencoder which is firstly trained by using predict, and after
    training only used for transforming.
    """
    if not self.disabled:
        if hasattr(self.base_element, 'predict'):
            return self.base_element.predict(data)
        elif hasattr(self.base_element, 'transform'):
            return self.base_element.transform(data)
        else:
            Logger().error('BaseException. base Element should have function ' +
                           'predict, or at least transform.')
            raise BaseException('base Element should have function predict, or at least transform.')
    else:
        return data

def predict_proba(self, data)

Predict probabilities base element needs predict_proba() function, otherwise throw base exception.

def predict_proba(self, data):
    """
    Predict probabilities
    base element needs predict_proba() function, otherwise throw
    base exception.
    """
    if not self.disabled:
        if hasattr(self.base_element, 'predict_proba'):
            return self.base_element.predict_proba(data)
        else:
            Logger().error('BaseException. base Element should have "predict_proba" function.')
        raise BaseException('base Element should have predict_proba function.')
    return data

def prettify_config_output(self, config_name, config_value, return_dict=False)

Makes the sklearn configuration dictionary human readable

Returns

  • prettified_configuration_string [str]: configuration as prettified string or configuration as dict with prettified keys
def prettify_config_output(self, config_name, config_value, return_dict=False):
    """
    Makes the sklearn configuration dictionary human readable
    Returns
    -------
    * `prettified_configuration_string` [str]:
        configuration as prettified string or configuration as dict with prettified keys
    """
    if isinstance(config_value, tuple):
        output = self.pipeline_element_configurations[config_value[0]][config_value[1]]
        if not output:
            if return_dict:
                return {self.pipeline_element_list[config_value[0]].name:None}
            else:
                return self.pipeline_element_list[config_value[0]].name
        else:
            if return_dict:
                return output
            return str(output)
    else:
        return super(PipelineSwitch, self).prettify_config_output(config_name, config_value)

def score(self, X_test, y_test)

Calls the score function on the base element: Returns a goodness of fit measure or a likelihood of unseen data:

def score(self, X_test, y_test):
    """
    Calls the score function on the base element:
    Returns a goodness of fit measure or a likelihood of unseen data:
    """
    return self.base_element.score(X_test, y_test)

def set_params(self, **kwargs)

The optimization process sees the amount of possible combinations and chooses one of them. Then this class activates the belonging element and prepared the element with the particular chosen configuration.

def set_params(self, **kwargs):
    """
    The optimization process sees the amount of possible combinations and chooses one of them.
    Then this class activates the belonging element and prepared the element with the particular chosen configuration.
    """
    config_nr = None
    if self.sklearn_name in kwargs:
        config_nr = kwargs[self._sklearn_curr_element]
    elif 'current_element' in kwargs:
        config_nr = kwargs['current_element']
    if config_nr is None or not isinstance(config_nr, (tuple, list)):
        Logger().error('ValueError: current_element must be of type Tuple')
        raise ValueError('current_element must be of type Tuple')
    else:
        self.current_element = config_nr
        config = self.pipeline_element_configurations[config_nr[0]][config_nr[1]]
        # remove name
        unnamed_config = {}
        for config_key, config_value in config.items():
            key_split = config_key.split('__')
            unnamed_config['__'.join(key_split[2::])] = config_value
        self.base_element.set_params(**unnamed_config)
    return self

def transform(self, data)

Calls transform on the base element.

IN CASE THERE IS NO TRANSFORM METHOD, CALLS PREDICT. This is used if we are using an estimator as a preprocessing step.

def transform(self, data):
    """
    Calls transform on the base element.
    IN CASE THERE IS NO TRANSFORM METHOD, CALLS PREDICT.
    This is used if we are using an estimator as a preprocessing step.
    """
    if not self.disabled:
        if hasattr(self.base_element, 'transform'):
            return self.base_element.transform(data)
        elif hasattr(self.base_element, 'predict'):
            return self.base_element.predict(data)
        else:
            Logger().error('BaseException: transform-predict-mess')
            raise BaseException('transform-predict-mess')
    else:
        return data

Instance variables

var base_element

Returns the currently active element

var current_element

var disabled

Inheritance: PipelineElement.disabled

var hyperparameters

var name

var pipeline_element_configurations

var sklearn_name

var test_disabled

Methods

def create(cls, name, base_element, hyperparameters, test_disabled=False, disabled=False, **kwargs)

Takes an instantiated object and encapsulates it into the PHOTON structure, add the disabled function and attaches information about the hyperparameters that should be tested

@classmethod
def create(cls, name, base_element, hyperparameters: dict, test_disabled=False, disabled=False, **kwargs):
    """
    Takes an instantiated object and encapsulates it into the PHOTON structure,
    add the disabled function and attaches information about the hyperparameters that should be tested
    """
    return PipelineElement(name, hyperparameters, test_disabled, disabled, base_element=base_element, **kwargs)