SHAP API

The physlearn.supervised.interpretation.interpret_regressor module provides SHAP utilities for regressor interpretability. It includes the physlearn.ShapInterpret class.

class physlearn.supervised.interpretation.interpret_regressor.ShapInterpret(regressor_choice='ridge', cv=5, random_state=0, verbose=0, n_jobs=-1, score_multioutput='raw_values', scoring='neg_mean_absolute_error', return_train_score=True, auto_target=True, pipeline_transform=None, pipeline_memory=None, params=None, target_index=None, chain_order=None, stacking_options=None, base_boosting_options=None, show=True)[source]

Bases: BaseRegressor

Interpret a regressor’s output with SHAP plots.

fit(X, y, index=None, sample_weight=None)[source]

Fit regressor.

explainer(X)[source]

Compute the importance of each feature for the underlying regressor.

summary_plot(X, y, plot_type='dot')[source]

Visualizaion of the feature importance and feature effects.

force_plot(X, y)[source]

Interactive Javascript visualization of Shapley values.

dependence_plot(X, y, interaction_index='auto', alpha=None, dot_size=None)[source]

Visualization of a feature’s effect on a regressor’s prediction.

decision_plot(X, y)[source]

Visualization of the additive feature attribution.

_check_target_index(y)

Automates subtask slicing in multi-target regression.

Parameters

y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

Returns

y

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

_estimate_fold_size(y, cv)

Helper method to estimate cross-validation fold size.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • cv (int, cross-validation generator, or an iterable) – Used in order to determine the fold size.

Returns

estimate

Return type

int

static _fit(regressor, X, y, sample_weight=None, **fit_params)

Helper fit method.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.

  • **fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise _fit_stages method.

classmethod _get_param_names()

Get parameter names for the estimator

_get_regressor()

Helper method which instantiates the regressor choice.

_modified_cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)

Performs (augmented) cross-validation.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – Array of scores for each run of the cross-validation procedure.

Return type

dict of float arrays of shape (n_splits,)

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

property _repr_html_

HTML representation of estimator.

This is redundant with the logic of _repr_mimebundle_. The latter should be favorted in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

_validate_data(X=None, y=None)

Checks the validity of the data representation(s).

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

Returns

out

Return type

validated data

property check_regressor

Checks if regressor adheres to scikit-learn conventions.

Namely, it runs sklearn.utils.estimator_checks.check_estimator.

cross_val_score(X, y, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)

Performs (augmented) cross-validation, then returns the withheld fold score.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – The withheld fold scores for each run of the cross-validation procedure.

Return type

pd.Series or pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)

Performs (augmented) cross-validation, and wraps the result in a DataFrame.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters
  • X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

  • return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.

  • error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.

  • return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.

  • cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.

  • fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – DataFrame of scores for each run of the cross-validation procedure.

Return type

pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

dump(value, filename)

Serializes the value with joblib.

Parameters
  • value (any Python object) – The object to store to disk.

  • filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

filenames – The list of file names in which the data is stored.

Return type

list of str

get_params(deep=True)

Retrieves the (hyper)parameters.

Parameters

deep (bool, optional (default=True)) – Although we do not use this parameter, it is required as various Scikit-learn utilities require it.

Returns

self.params – (Hyper)parameter names mapped to their values.

Return type

dict

get_pipeline(y, n_quantiles=None)

Creates pipe attribute for downstream tasks.

This method constructs a ModifiedPipeline from the given base regressor.

Parameters
  • y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.

  • n_quantiles (int or None, optional (default=None)) – Number of quantiles in sklearn.preprocessing.QuantileTransformer, if pipeline_transform is either `quantileuniform` or `quantilenormal`.

pipe

A ModifiedPipeline object.

Type

physlearn.pipeline.ModifiedPipeline

load(filename)

Deserializes the file object.

Parameters

filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

joblib.load – The object stored in the file.

Return type

any Python object

predict(X)

Generates predictions with the ModifiedPipeline object.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).

Returns

y_pred – The predictions generated by the induced ModifiedPipeline object.

Return type

array-like of shape = [n_samples] or shape = [n_samples, n_targets]

regattr(attr)

Gets a regressor’s attribute from the ModifiedPipeline object.

The pipe attribute must exist in order to use this method.

Parameters

attr (str) – The name of the regressor’s attribute.

Returns

attr

Return type

type of attribute

score(y_true, y_pred, scoring='mse', multioutput='raw_values')

Computes the supervised score.

Parameters
  • y_true (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The observed target matrix, where each row corresponds to an example and the column(s) correspond to the observed single-target(s).

  • y_pred (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The predicted target matrix, where each row corresponds to an example and the column(s) correspond to the predicted single-target(s).

  • scoring (str, optional (default='mse')) – The scoring name, which may be mae, mse, rmse, r2, ev, or msle.

  • multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

Returns

score – The computed score.

Return type

float or ndarray of floats

set_params(**params)

Sets the regressor’s (hyper)parameters.

Parameters

**params (dict) – The regressor’s (hyper)parameters.

Returns

self – The base regressor object.

Return type

BaseRegressor