Regression API

The physlearn.supervised.regression module provides machine learning utilities, which solve single-target and multi-target regression tasks. It includes the physlearn.BaseRegressor and physlearn.Regressor classes.

class physlearn.supervised.regression.BaseRegressor(regressor_choice='ridge', cv=5, random_state=0, verbose=0, n_jobs=-1, score_multioutput='raw_values', scoring='neg_mean_absolute_error', return_train_score=True, auto_target=True, pipeline_transform=None, pipeline_memory=None, params=None, target_index=None, chain_order=None, stacking_options=None, base_boosting_options=None)[source]

Bases: BaseEstimator, RegressorMixin, AdditionalRegressorMixin

Base class for regressor amalgamation.

The object is designed to amalgamate regressors from Scikit-learn, LightGBM, XGBoost, CatBoost, and Mlxtend into a unified framework, which follows the Scikit-learn API. Important methods include fit, predict, score, dump, load, cross_validate, and cross_val_score.

Parameters
  • regressor_choice (str, optional (default='ridge')) – Specifies the case-insensitive regressor choice.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=5)) – Determines the cross-validation strategy if the regressor choice is stacking, if the task is multi-target regression and the single-targets are chained, and as the default in the k-fold cross-validation methods.

  • random_state (int, RandomState instance, or None, optional (default=0)) – Determines the random number generation in the regressor choice mlxtend.regressor.StackingCVRegressor and in the modified pipeline construction.

  • verbose (int, optional (default=0)) – Determines verbosity in either regressor choice: mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • n_jobs (int or None, optional (default=-1)) – The number of jobs to run in parallel if the regressor choice is stacking or voting, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • score_multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values in the score method, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

  • scoring (str, callable, list/tuple, or dict, optional (default='neg_mean_absolute_error')) – Determines scoring in the k-fold cross-validation methods.

  • return_train_score (bool, optional (default=True)) – Determines whether to return the training scores from the k-fold cross-validation methods.

  • auto_target (bool, optional (default=True)) – Determines whether to automatically handle the pipeline steps or let the user specify the steps.

  • pipeline_transform (str, list, tuple, or None, optional (default=None)) – Choice of transform(s) used in the modified pipeline construction. If the specified choice is a string, then it must be a default option, where 'standardscaler', 'boxcox', 'yeojohnson', 'quantileuniform', and 'quantilenormal' denote sklearn.preprocessing.StandardScaler, sklearn.preprocessing.PowerTransformer with method='box-cox' or method='yeo-johnson', and sklearn.preprocessing.QuantileTransformer with output_distribution='uniform' or output_distribution='normal', respectively.

  • pipeline_memory (str or object with the joblib.Memory interface, optional (default=None)) – Enables fitted transform caching in the modified pipeline construction.

  • params (dict, list, or None, optional (default=None)) – The choice of (hyper)parameters for the regressor choice. If None, then the default (hyper)parameters are utilized.

  • target_index (int, or None, optional (default=None)) – Specifies the single-target regression subtask in the multi-target regression task.

  • chain_order (list or None) – Determines the target order in sklearn.multioutput.RegressorChain during the modified pipeline construction.

  • stacking_options (dict or None, optional (default=None)) –

    A dictionary of stacking options, whereby layers must be specified:

    layers dict

    A dictionary of stacking layer(s).

    shuffle bool or None, (default=True)

    Determines whether to shuffle the training data in mlxtend.regressor.StackingCVRegressor.

    refit bool or None, (default=True)

    Determines whether to clone and refit the regressors in mlxtend.regressor.StackingCVRegressor.

    passthrough bool or None, (default=True)

    Determines whether to concatenate the original features with the first stacking layer predictions in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

    meta_featuresbool or None, (default=True)

    Determines whether to make the concatenated features accessible through the attribute train_meta_features_ in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

    voting_weightsndarray of shape (n_regressors,) or None, (default=None)

    Sequence of weights for sklearn.ensemble.VotingRegressor.

  • base_boosting_options (dict or None, optional (default=None)) –

    A dictionary of base boosting options used in the modified pipeline construction, wherein the following options must be specified:

    n_estimators int

    The number of basis functions in the noise term of the additive expansion. Note that this option may also be specified as n_regressors.

    boosting_loss str

    The loss function utilized in the pseudo-residual computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    line_search_options dict
    init_guess int, float, or ndarray

    The initial guess for the expansion coefficient.

    opt_method str

    Choice of optimization method. If 'minimize', then scipy.optimize.minimize, else if 'basinhopping', then scipy.optimize.basinhopping.

    method str or None

    The type of solver utilized in the optimization method.

    tol float or None

    The epsilon tolerance for terminating the optimization method.

    options dict or None

    A dictionary of solver options.

    niter int or None

    The number of iterations in basin-hopping.

    T float or None

    The temperature paramter utilized in basin-hopping, which determines the accept or reject criterion.

    loss str

    The loss function utilized in the line search computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    regularization int or float

    The regularization strength in the line search computation.

Notes

The score method differs from the Scikit-learn usage, as the method is designed to abstract the regressor metrics, e.g., sklearn.metrics.mean_absolute_error.

See also

physlearn.pipeline.ModifiedPipeline

Class for creating a pipeline.

physlearn.supervised.regression.Regressor

Main class for regressor amalgamation.

Examples

>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> from sklearn.model_selection import train_test_split
>>> from physlearn import BaseRegressor
>>> X, y = load_boston(return_X_y=True)
>>> X, y = pd.DataFrame(X), pd.Series(y)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=42)
>>> reg = BaseRegressor(regressor_choice='lgbmregressor',
                        pipeline_transform='standardscaler')
>>> y_pred = reg.fit(X_train, y_train).predict(X_test)
>>> reg.score(y_test, y_pred)
array([11.63706835])
_get_regressor()[source]

Helper method which instantiates the regressor choice.

property check_regressor

Checks if regressor adheres to scikit-learn conventions.

Namely, it runs sklearn.utils.estimator_checks.check_estimator.

get_params(deep=True)[source]

Retrieves the (hyper)parameters.

Parameters

deep (bool, optional (default=True)) – Although we do not use this parameter, it is required as various Scikit-learn utilities require it.

Returns

self.params – (Hyper)parameter names mapped to their values.

Return type

dict

set_params(**params)[source]

Sets the regressor’s (hyper)parameters.

Parameters

**params (dict) – The regressor’s (hyper)parameters.

Returns

self – The base regressor object.

Return type

BaseRegressor

_validate_data(X=None, y=None)[source]

Checks the validity of the data representation(s).

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

Returns

out

Return type

validated data

dump(value, filename)[source]

Serializes the value with joblib.

Parameters
  • value (any Python object) – The object to store to disk.

  • filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

filenames – The list of file names in which the data is stored.

Return type

list of str

load(filename)[source]

Deserializes the file object.

Parameters

filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

joblib.load – The object stored in the file.

Return type

any Python object

get_pipeline(y, n_quantiles=None)[source]

Creates pipe attribute for downstream tasks.

This method constructs a ModifiedPipeline from the given base regressor.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

  • n_quantiles (int or None, optional (default=None)) – Number of quantiles in sklearn.preprocessing.QuantileTransformer, if pipeline_transform is either `quantileuniform` or `quantilenormal`.

pipe

A ModifiedPipeline object.

Type

physlearn.pipeline.ModifiedPipeline

regattr(attr)[source]

Gets a regressor’s attribute from the ModifiedPipeline object.

The pipe attribute must exist in order to use this method.

Parameters

attr (str) – The name of the regressor’s attribute.

Returns

attr

Return type

type of attribute

_check_target_index(y)[source]

Automates subtask slicing in multi-target regression.

Parameters

y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

Returns

y

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

static _fit(regressor, X, y, sample_weight=None, **fit_params)[source]

Helper fit method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.

  • **fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise _fit_stages method.

fit(X, y, sample_weight=None)[source]

Fits the ModifiedPipeline object.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.

Returns

self.pipe – The induced pipeline object.

Return type

ModifiedPipeline

predict(X)[source]

Generates predictions with the ModifiedPipeline object.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

Returns

y_pred – The predictions generated by the induced ModifiedPipeline object.

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

score(y_true, y_pred, scoring='mse', multioutput='raw_values')[source]

Computes the supervised score.

Parameters
  • y_true (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The observed target matrix, where each row corresponds to an example and the column(s) correspond to the observed single-target(s).

  • y_pred (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The predicted target matrix, where each row corresponds to an example and the column(s) correspond to the predicted single-target(s).

  • scoring (str, optional (default='mse')) – The scoring name, which may be mae, mse, rmse, r2, ev, or msle.

  • multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

Returns

score – The computed score.

Return type

float or ndarray of floats

_estimate_fold_size(y, cv)[source]

Helper method to estimate cross-validation fold size.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • cv (int, cross-validation generator, or an iterable) – Used in order to determine the fold size.

Returns

estimate

Return type

int

_modified_cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs (augmented) cross-validation.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – Array of scores for each run of the cross-validation procedure.

Return type

dict of float arrays of shape (n_splits,)

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs (augmented) cross-validation, and wraps the result in a DataFrame.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – DataFrame of scores for each run of the cross-validation procedure.

Return type

pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

cross_val_score(X, y, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs (augmented) cross-validation, then returns the withheld fold score.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – The withheld fold scores for each run of the cross-validation procedure.

Return type

pd.Series or pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

class physlearn.supervised.regression.Regressor(regressor_choice='ridge', cv=5, random_state=0, verbose=1, n_jobs=-1, score_multioutput='raw_values', scoring='neg_mean_absolute_error', return_train_score=True, auto_target=True, pipeline_transform='quantilenormal', pipeline_memory=None, params=None, target_index=None, chain_order=None, stacking_options=None, base_boosting_options=None, refit=True, randomizedcv_n_iter=20, bayesoptcv_init_points=2, bayesoptcv_n_iter=20)[source]

Bases: BaseRegressor

Main class for regressor amalgamation.

The object is designed to amalgamate regressors from Scikit-learn, LightGBM, XGBoost, CatBoost, and Mlxtend into a unified framework, which follows the Scikit-learn API. Important methods include fit, predict, score, baseboostcv, search, dump, load, cross_val_score, and nested_cross_validate.

Parameters
  • regressor_choice (str, optional (default='ridge')) – Specifies the case-insensitive regressor choice.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=5)) – Determines the cross-validation strategy if the regressor choice is stacking, if the task is multi-target regression and the single-targets are chained, and as the default in the k-fold cross-validation methods.

  • random_state (int, RandomState instance, or None, optional (default=0)) – Determines the random number generation in the regressor choice mlxtend.regressor.StackingCVRegressor and in the modified pipeline construction.

  • verbose (int, optional (default=1)) – Determines verbosity in either regressor choice: mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • n_jobs (int or None, optional (default=-1)) – The number of jobs to run in parallel if the regressor choice is stacking or voting, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • score_multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values in the score method, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

  • scoring (str, callable, list/tuple, or dict, optional (default='neg_mean_absolute_error')) – Determines scoring in the k-fold cross-validation methods.

  • refit (bool, optional (default=True)) – Determines whether to return the refit regressor in the search method.

  • randomizedcv_n_iter (int, optional (default=20)) – Determines the number of (hyper)parameter settings that are sampled in the search method, when the chosen search is 'randomizedsearchcv', e.g., RandomizedSearchCV from Scikit-learn.

  • bayesoptcv_init_points (int, optional (default=2)) – Determines the number of random exploration steps in the search method, when the chose search method is 'bayesoptcv', e.g., Bayesian Optimization. Increasing the number corresponds to diversifying the exploration space.

  • bayesoptcv_n_iter (int, optional (default=20)) –

    Determines the number of Bayesian optimization steps in the search method, when the chose search method is 'bayesoptcv', e.g., Bayesian Optimization.

  • return_train_score (bool, optional (default=True)) – Determines whether to return the training scores from the k-fold cross-validation methods.

  • pipeline_transform (str, list, tuple, or None, optional (default='quantilenormal')) – Choice of transform(s) used in the modified pipeline construction. If the specified choice is a string, then it must be a default option, where 'standardscaler', 'boxcox', 'yeojohnson', 'quantileuniform', and 'quantilenormal' denote sklearn.preprocessing.StandardScaler, sklearn.preprocessing.PowerTransformer with method='box-cox' or method='yeo-johnson', and sklearn.preprocessing.QuantileTransformer with output_distribution='uniform' or output_distribution='normal', respectively.

  • pipeline_memory (str or object with the joblib.Memory interface, optional (default=None)) – Enables fitted transform caching in the modified pipeline construction.

  • params (dict, list, or None, optional (default=None)) – The choice of (hyper)parameters for the regressor choice. If None, then the default (hyper)parameters are utilized.

  • target_index (int, or None, optional (default=None)) – Specifies the single-target regression subtask in the multi-target regression task.

  • chain_order (list or None) – Determines the target order in sklearn.multioutput.RegressorChain during the modified pipeline construction.

  • stacking_options (dict or None, optional (default=None)) –

    A dictionary of stacking options, whereby layers must be specified:

    layers dict

    A dictionary of stacking layer(s).

    shuffle bool or None, (default=True)

    Determines whether to shuffle the training data in mlxtend.regressor.StackingCVRegressor.

    refit bool or None, (default=True)

    Determines whether to clone and refit the regressors in mlxtend.regressor.StackingCVRegressor.

    passthrough bool or None, (default=True)

    Determines whether to concatenate the original features with the first stacking layer predictions in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

    meta_featuresbool or None, (default=True)

    Determines whether to make the concatenated features accessible through the attribute train_meta_features_ in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

    voting_weightsndarray of shape (n_regressors,) or None, (default=None)

    Sequence of weights for sklearn.ensemble.VotingRegressor.

  • base_boosting_options (dict or None, optional (default=None)) –

    A dictionary of base boosting options used in the modified pipeline construction, wherein the following options must be specified:

    n_estimators int

    The number of basis functions in the noise term of the additive expansion. Note that this option may also be specified as n_regressors.

    boosting_loss str

    The loss function utilized in the pseudo-residual computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    line_search_options dict
    init_guess int, float, or ndarray

    The initial guess for the expansion coefficient.

    opt_method str

    Choice of optimization method. If 'minimize', then scipy.optimize.minimize, else if 'basinhopping', then scipy.optimize.basinhopping.

    method str or None

    The type of solver utilized in the optimization method.

    tol float or None

    The epsilon tolerance for terminating the optimization method.

    options dict or None

    A dictionary of solver options.

    niter int or None

    The number of iterations in basin-hopping.

    T float or None

    The temperature paramter utilized in basin-hopping, which determines the accept or reject criterion.

    loss str

    The loss function utilized in the line search computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    regularization int or float

    The regularization strength in the line search computation.

Notes

The score method differs from the Scikit-learn usage, as the method is designed to abstract the regressor metrics, e.g., sklearn.metrics.mean_absolute_error. Moreover, it computes multiple metrics, and returns the scores in a pandas object.

See also

physlearn.pipeline.ModifiedPipeline

Class for creating a pipeline.

physlearn.supervised.regression.BaseRegressor

Base class for regressor amalgamation.

Examples

>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> from sklearn.decomposition import PCA, TruncatedSVD
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import FeatureUnion
>>> from physlearn import Regressor
>>> X, y = load_boston(return_X_y=True)
>>> X, y = pd.DataFrame(X), pd.Series(y)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=42)
>>> transformer_list = [('pca', PCA(n_components=1)),
                        ('svd', TruncatedSVD(n_components=2))]
>>> union = FeatureUnion(transformer_list=transformer_list, n_jobs=-1)
>>> stack = dict(regressors=['kneighborsregressor', 'bayesianridge'],
                 final_regressor='lasso')
>>> reg = Regressor(regressor_choice='stackingregressor',
                    pipeline_transform=('tr', union),
                    stacking_options=dict(layers=stack))
>>> y_pred = reg.fit(X_train, y_train).predict(X_test)
>>> reg.score(y_test, y_pred)
             mae        mse      rmse        r2       ev      msle
target
0       4.775145  42.874253  6.547843  0.387748  0.40836  0.079818
property check_regressor

Checks if regressor adheres to scikit-learn conventions.

Namely, it runs sklearn.utils.estimator_checks.check_estimator. Scikit-learn and Mlxtend stacking regressors, as well as LightGBM, XGBoost, and CatBoost regressor do not adhere to the convention.

get_params(deep=True)[source]

Retrieves the (hyper)parameters.

Parameters

deep (bool, optional (default=True)) – Although we do not use this parameter, it is required as various Scikit-learn utilities require it.

Returns

self.params – (Hyper)parameter names mapped to their values.

Return type

dict

set_params(**params)[source]

Sets the regressor’s (hyper)parameters.

Parameters

**params (dict) – The regressor’s (hyper)parameters.

Returns

self – The base regressor object.

Return type

BaseRegressor

dump(value, filename)[source]

Serializes the value with joblib.

Parameters
  • value (any Python object) – The object to store to disk.

  • filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

filenames – The list of file names in which the data is stored.

Return type

list of str

load(filename)[source]

Deserializes the file object.

Parameters

filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

joblib.load – The object stored in the file.

Return type

any Python object

regattr(attr)[source]

Gets a regressor’s attribute from the ModifiedPipeline object.

The pipe attribute must exist in order to use this method.

Parameters

attr (str) – The name of the regressor’s attribute.

Returns

attr

Return type

type of attribute

fit(X, y, sample_weight=None)[source]

Fits the ModifiedPipeline object.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.

Returns

self.pipe – The induced pipeline object.

Return type

ModifiedPipeline

_inbuilt_model_selection_step(X, y)[source]

Performs augmented cross-validation.

This method is designed to be utilized within physlearn.supervised.regression.Regressor.baseboostcv(), as the inbuilt model selection step.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

_return_incumbent

This flag implies that the incumbent won the inbuilt model selection step.

Type

bool

Return type

None

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

baseboostcv(X, y, **fit_params)[source]

Base boosting with inbuilt cross-validation.

This method starts with inbuilt cross-validation, which scores both the incumbent and the candidate base boosting algorithm. If the incumbent wins, then the explict model of the domain is the single-target regressor. Otherwise, base boosting greedily boosts the explict model of the domain in a stagewise fashion.

In essence, this method acts as a fit method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • **fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise _fit_stages method.

return_incumbent_

This flag implies that the incumbent won the inbuilt model selection step, and it notifies the predict method.

Type

bool

Returns

single-target regressor

Return type

Regressor or ModifiedPipeline

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

predict(X)[source]

Generates predictions with the ModifiedPipeline object.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

Returns

y_pred – The predictions generated by the induced ModifiedPipeline object.

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

score(y_true, y_pred, path=None)[source]

Computes the DataFrame of supervised scores.

The scoring metrics include mean squared error, mean absolute error, root mean squared error, R^2, explained variance, and mean squared logarithmic error. If the observed or predicted single-targets contain negative values, then the mean squared logarithmic error is not included, as the score is considered a NaN.

Parameters
  • y_true (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The observed target matrix, where each row corresponds to an example and the column(s) correspond to the observed single-target(s).

  • y_pred (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The predicted target matrix, where each row corresponds to an example and the column(s) correspond to the predicted single-target(s).

  • path (str or file handle, optional (default=None)) – The file path or object, if the scoring DataFrame is to be saved to a comma-seperated values (csv) file.

Returns

scores – The pandas object of computed scores.

Return type

pd.DataFrame or pd.Series

cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs (augmented) cross-validation, and wraps the result in a DataFrame.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – DataFrame of scores for each run of the cross-validation procedure.

Return type

pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

cross_val_score(X, y, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs (augmented) cross-validation, then returns the withheld fold score.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – The withheld fold scores for each run of the cross-validation procedure.

Return type

pd.Series or pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

_preprocess_search_params(y, search_params)[source]

Helper method for preprocessing (hyper)parameters.

This method automatically preprocesses (hyper)parameter names for the exhaustive search method by determining whether the task is single-target or multi-target regression. In the latter case, it further determines the user’s assumption on the single-targets’s independence. Namely, it asks if the user wishes to chain the single-targets.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • search_params (dict) – Dictionary with (hyper)parameter names as keys, and either lists of (hyper)parameter settings to try as values or tuples of (hyper)parameter lower and upper bounds to try as values.

Returns

search_params – The preprocessed (hyper)parameters.

Return type

dict

Helper (hyper)parameter search method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • search_params (dict) – Dictionary with (hyper)parameter names as keys, and either lists of (hyper)parameter settings to try as values or tuples of (hyper)parameter lower and upper bounds to try as values.

  • search_method (str, optional (default='gridsearchcv')) – Specifies the search method. If 'gridsearchcv', 'randomizedsearchcv', or 'bayesoptcv' then the search method is GridSearchCV, RandomizedSearchCV, or Bayesian Optimization.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

_method

An instance of the (hyper)parameter search object.

Type

GridSearchCV, RandomizedSearchCV, BayesianOptimization

search(X, y, search_params, search_method='gridsearchcv', cv=None, path=None)[source]

(Hyper)parameter search method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • search_params (dict) – Dictionary with (hyper)parameter names as keys, and either lists of (hyper)parameter settings to try as values or tuples of (hyper)parameter lower and upper bounds to try as values.

  • search_method (str, optional (default='gridsearchcv')) – Specifies the search method. If 'gridsearchcv', 'randomizedsearchcv', or 'bayesoptcv' then the search method is GridSearchCV, RandomizedSearchCV, or Bayesian Optimization.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • path (str or file handle, optional (default=None)) – The file path or object, if the scoring DataFrame is to be saved to a comma-seperated values (csv) file.

best_params_

The optimal (hyper)parameters.

Type

pd.Series

best_score_

The scores for the optimal (hyper)parameters.

Type

pd.Series

search_summary_

Bundles the best_params_, best_score_, and refit_time into one attribute.

Type

pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

_search_and_score(pipeline, X, y, scorer, train, test, verbose, search_params, search_method='gridsearchcv', cv=None)[source]

Helper method for nested cross-validation.

Exhaustively searches over the specified (hyper)parameters in the inner loop then scores the best performing regressor in the outer loop.

Parameters
  • pipeline (ModifiedPipeline) – A ModifiedPipeline object.

  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • scorer (dict) – A dict mapping each scorer name to its validated scorer.

  • train (list) – A list of indices for the training folds.

  • test (list) – A list of indices for the withheld folds.

  • verbose (int) – Determines verbosity.

  • search_params (dict) – Dictionary with (hyper)parameter names as keys, and either lists of (hyper)parameter settings to try as values or tuples of (hyper)parameter lower and upper bounds to try as values.

  • search_method (str, optional (default='gridsearchcv')) – Specifies the search method. If 'gridsearchcv', 'randomizedsearchcv', or 'bayesoptcv' then the search method is GridSearchCV, RandomizedSearchCV, or Bayesian Optimization.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

Returns

score

Return type

tuple

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

nested_cross_validate(X, y, search_params, search_method='gridsearchcv', outer_cv=None, inner_cv=None, return_inner_loop_score=False)[source]

Performs a nested cross-validation procedure.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • search_params (dict) – Dictionary with (hyper)parameter names as keys, and either lists of (hyper)parameter settings to try as values or tuples of (hyper)parameter lower and upper bounds to try as values.

  • search_method (str, optional (default='gridsearchcv')) – Specifies the search method. If 'gridsearchcv', 'randomizedsearchcv', or 'bayesoptcv' then the search method is GridSearchCV, RandomizedSearchCV, or Bayesian Optimization.

  • outer_cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the outer loop cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • inner_cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the inner loop cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • return_inner_loop_score (bool, optional (default=False)) – If True, then we return the inner loop score in addition to the outer loop score.

Returns

score

Return type

pd.Series or tuple

Notes

The procedure does not compute the single best set of (hyper)parameters, as each inner loop may return a different set of optimal (hyper)parameters.

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

Jacques Wainer and Gavin Cawley. “Nested cross-validation when selecting classifiers is overzealous for most practical applications,” arXiv preprint arXiv:1809.09446 (2018).

subsample(X, y, subsample_proportion=None)[source]

Subsamples from the design and target matrices.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • subsample_proportion (float or None, optional (default=None)) – Determines the proportion of observations to use in the subsampling procedure.

Returns

out – A tuple with the X and y data.

Return type

tuple

_check_target_index(y)

Automates subtask slicing in multi-target regression.

Parameters

y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

Returns

y

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

_estimate_fold_size(y, cv)

Helper method to estimate cross-validation fold size.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • cv (int, cross-validation generator, or an iterable) – Used in order to determine the fold size.

Returns

estimate

Return type

int

static _fit(regressor, X, y, sample_weight=None, **fit_params)

Helper fit method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.

  • **fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise _fit_stages method.

_get_regressor()

Helper method which instantiates the regressor choice.

_modified_cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)

Performs (augmented) cross-validation.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – Array of scores for each run of the cross-validation procedure.

Return type

dict of float arrays of shape (n_splits,)

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

_validate_data(X=None, y=None)

Checks the validity of the data representation(s).

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

Returns

out

Return type

validated data

get_pipeline(y, n_quantiles=None)

Creates pipe attribute for downstream tasks.

This method constructs a ModifiedPipeline from the given base regressor.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

  • n_quantiles (int or None, optional (default=None)) – Number of quantiles in sklearn.preprocessing.QuantileTransformer, if pipeline_transform is either `quantileuniform` or `quantilenormal`.

pipe

A ModifiedPipeline object.

Type

physlearn.pipeline.ModifiedPipeline

The physlearn.supervised.interface provides an interface between physlearn.BaseRegressor and the regressor dictionary. It includes the physlearn.RegressorDictionaryInterface class.

class physlearn.supervised.interface.RegressorDictionaryInterface(regressor_choice, params=None, stacking_options=None)[source]

Bases: AbstractEstimatorDictionaryInterface

BaseRegressor and regressor dictionary interface.

The regressor dictionary collects key-value pairs, whereby each key is a lower case regressor class name that uniquely identifies the regressor class, e.g., dict('ridge': Ridge). As such, the interface manages regressor class retrieval for physlearn.BaseRegressor as part of the constructor method.

Parameters
  • regressor_choice (str) – The dictionary key for lookup in the dictionary of regressors. The key must be in lower cases, e.g., the Scikit-learn regressor Ridge has key 'ridge'.

  • params (dict, list, or None, optional (default=None)) – The choice of (hyper)parameters.

  • stacking_options (dict or None, optional (default=None)) –

    A dictionary of stacking options, whereby layers must be specified:

    layers dict

    A dictionary of stacking layer(s).

    shuffle bool or None, (default=True)

    Determines whether to shuffle the training data in mlxtend.regressor.StackingCVRegressor.

    refit bool or None, (default=True)

    Determines whether to clone and refit the regressors in mlxtend.regressor.StackingCVRegressor.

    passthrough bool or None, (default=True)

    Determines whether to concatenate the original features with the first stacking layer predictions in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

    meta_featuresbool or None, (default=True)

    Determines whether to make the concatenated features accessible through the attribute train_meta_features_ in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

    voting_weightsndarray of shape (n_regressors,) or None, (default=None)

    Sequence of weights for sklearn.ensemble.VotingRegressor.

Examples

>>> from physlearn import RegressorDictionaryInterface
>>> interface = RegressorDictionaryInterface(regressor_choice='mlpregressor',
                                             params=dict(alpha=1))
>>> interface.set_params()
MLPRegressor(alpha=1)
get_params(regressor)[source]

Retrieves the (hyper)parameters.

Parameters

regressor (estimator) – A regressor that follows the Scikit-learn API.

Notes

The method physlearn.RegressorDictionaryInterface.set_params() must be called beforehand.

set_params(**kwargs)[source]

Sets the (hyper)parameters.

If params is None, then the default (hyper)parameters are set.

Parameters
  • cv (int, cross-validation generator, an iterable, or None) – Determines the cross-validation strategy in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

  • verbose (int or None) – Determines verbosity in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

  • random_state (int, RandomState instance, or None) – Determines the random number generation in mlxtend.regressor.StackingCVRegressor.

  • n_jobs (int or None) – The number of jobs to run in parallel.

  • stacking_options (dict or None, optional (default=None)) –

    A dictionary of stacking options, whereby layers must be specified:

    layers dict

    A dictionary of stacking layer(s).

    shuffle bool or None, (default=True)

    Determines whether to shuffle the training data in mlxtend.regressor.StackingCVRegressor.

    refit bool or None, (default=True)

    Determines whether to clone and refit the regressors in mlxtend.regressor.StackingCVRegressor.

    passthrough bool or None, (default=True)

    Determines whether to concatenate the original features with the first stacking layer predictions in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

    meta_featuresbool or None, (default=True)

    Determines whether to make the concatenated features accessible through the attribute train_meta_features_ in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

    voting_weightsndarray of shape (n_regressors,) or None, (default=None)

    Sequence of weights for sklearn.ensemble.VotingRegressor.