Learning Curve API

The physlearn.supervised.model_selection.learning_curve module provides utilities for plotting learning curves. It includes the physlearn.LearningCurve class and the physlearn.plot_learning_curve() function.

class physlearn.supervised.model_selection.learning_curve.LearningCurve(regressor_choice='ridge', cv=5, random_state=0, verbose=0, n_jobs=-1, score_multioutput='raw_values', scoring='neg_mean_absolute_error', return_train_score=True, auto_target=True, pipeline_transform='quantilenormal', pipeline_memory=None, params=None, target_index=None, chain_order=None, stacking_options=None, base_boosting_options=None)[source]

Bases: BaseRegressor

Learning curve object that supports base boosting.

The object retains the original functionality provided by the Scikit-learn learning curve utility, which performs a cross-validation procedure with varying training sizes. It extends the utility to support augmented cross-validation procedures, which score an incumbent model and a candidate model on the same withheld folds.

Parameters
  • regressor_choice (str, optional (default='ridge')) – Specifies the case-insensitive regressor choice.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=5)) – Determines the cross-validation strategy if the regressor choice is stacking, if the task is multi-target regression and the single-targets are chained, and as the default in the k-fold cross-validation methods.

  • random_state (int, RandomState instance, or None, optional (default=0)) – Determines the random number generation in the regressor choice mlxtend.regressor.StackingCVRegressor and in the modified pipeline construction.

  • verbose (int, optional (default=0)) – Determines verbosity in either regressor choice: mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • n_jobs (int or None, optional (default=-1)) – The number of jobs to run in parallel if the regressor choice is stacking or voting, in the modified pipeline construction, and in the k-fold cross-validation methods.

  • score_multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values in the score method, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

  • scoring (str, callable, list/tuple, or dict, optional (default='neg_mean_absolute_error')) – Determines scoring in the k-fold cross-validation methods.

  • return_train_score (bool, optional (default=True)) – Determines whether to return the training scores from the k-fold cross-validation methods.

  • pipeline_transform (str, list, tuple, or None, optional (default='quantilenormal')) – Choice of transform(s) used in the modified pipeline construction. If the specified choice is a string, then it must be a default option, where 'standardscaler', 'boxcox', 'yeojohnson', 'quantileuniform', and 'quantilenormal' denote sklearn.preprocessing.StandardScaler, sklearn.preprocessing.PowerTransformer with method='box-cox' or method='yeo-johnson', and sklearn.preprocessing.QuantileTransformer with output_distribution='uniform' or output_distribution='normal', respectively.

  • pipeline_memory (str or object with the joblib.Memory interface, optional (default=None)) – Enables fitted transform caching in the modified pipeline construction.

  • params (dict, list, or None, optional (default=None)) – The choice of (hyper)parameters for the regressor choice. If None, then the default (hyper)parameters are utilized.

  • target_index (int, or None, optional (default=None)) – Specifies the single-target regression subtask in the multi-target regression task.

  • chain_order (list or None) – Determines the target order in sklearn.multioutput.RegressorChain during the modified pipeline construction.

  • stacking_options (dict or None, optional (default=None)) –

    A dictionary of stacking options, whereby layers must be specified:

    layers dict

    A dictionary of stacking layer(s).

    shuffle bool or None, (default=True)

    Determines whether to shuffle the training data in mlxtend.regressor.StackingCVRegressor.

    refit bool or None, (default=True)

    Determines whether to clone and refit the regressors in mlxtend.regressor.StackingCVRegressor.

    passthrough bool or None, (default=True)

    Determines whether to concatenate the original features with the first stacking layer predictions in sklearn.ensemble.StackingRegressor, mlxtend.regressor.StackingRegressor, or mlxtend.regressor.StackingCVRegressor.

    meta_featuresbool or None, (default=True)

    Determines whether to make the concatenated features accessible through the attribute train_meta_features_ in mlxtend.regressor.StackingRegressor and mlxtend.regressor.StackingCVRegressor.

    voting_weightsndarray of shape (n_regressors,) or None, (default=None)

    Sequence of weights for sklearn.ensemble.VotingRegressor.

  • base_boosting_options (dict or None, optional (default=None)) –

    A dictionary of base boosting options used in the modified pipeline construction, wherein the following options must be specified:

    n_estimators int

    The number of basis functions in the noise term of the additive expansion. Note that this option may also be specified as n_regressors.

    boosting_loss str

    The loss function utilized in the pseudo-residual computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    line_search_options dict
    init_guess int, float, or ndarray

    The initial guess for the expansion coefficient.

    opt_method str

    Choice of optimization method. If 'minimize', then scipy.optimize.minimize, else if 'basinhopping', then scipy.optimize.basinhopping.

    method str or None

    The type of solver utilized in the optimization method.

    tol float or None

    The epsilon tolerance for terminating the optimization method.

    options dict or None

    A dictionary of solver options.

    niter int or None

    The number of iterations in basin-hopping.

    T float or None

    The temperature paramter utilized in basin-hopping, which determines the accept or reject criterion.

    loss str

    The loss function utilized in the line search computation, where ‘ls’ denotes the squared error loss function, ‘lad’ denotes the absolute error loss function, ‘huber’ denotes the Huber loss function, and ‘quantile’ denotes the quantile loss function.

    regularization int or float

    The regularization strength in the line search computation.

See also

physlearn.supervised.regression.BaseRegressor

Base class for regressor amalgamation.

_modified_learning_curve(X, y, train_sizes=array([0.1, 0.325, 0.55, 0.775, 1.0]), return_train_score=True, return_times=False, return_estimator=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)[source]

Performs an (augmented) cross-validation procedure with varying training sizes.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • train_sizes (array-like of shape (n_ticks,), optional (default=np.linspace(0.1, 1.0, 5))) – The array elements determine the amount of traning examples used in each cross-validation procedure.

  • return_train_score (bool, optional (default=True)) – Determines whether to return the candidate’s training fold scores.

  • return_times (bool, optional (default=False)) – Determines whether to return the candidate’s fit and score times.

  • return_estimator (bool, optional (default=False)) – Determines whether to return the induced estimator.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing an estimator. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=False)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

physlearn.plot_learning_curve(regressor_choice, title, X, y, verbose=0, cv=5, train_sizes=array([0.2, 0.4, 0.6, 0.8, 1.0]), alpha=0.1, train_color='b', cv_color='orange', y_ticks_step=0.15, fill_std=False, legend_loc='best', save_plot=False, scoring='neg_mean_absolute_error', pipeline_transform='quantilenormal', path=None, pipeline_memory=None, params=None, target_index=None, chain_order=None, ylabel=None, stacking_options=None, base_boosting_options=None, return_incumbent_score=False)[source]