SHAP API¶

The physlearn.supervised.interpretation.interpret_regressor module provides SHAP utilities for regressor interpretability. It includes the physlearn.ShapInterpret class.

class physlearn.supervised.interpretation.interpret_regressor.ShapInterpret(regressor_choice='ridge', cv=5, random_state=0, verbose=0, n_jobs=-1, score_multioutput='raw_values', scoring='neg_mean_absolute_error', return_train_score=True, auto_target=True, pipeline_transform=None, pipeline_memory=None, params=None, target_index=None, chain_order=None, stacking_options=None, base_boosting_options=None, show=True)[source]¶

Bases: BaseRegressor

Interpret a regressor’s output with SHAP plots.

fit(X, y, index=None, sample_weight=None)[source]¶: Fit regressor.

explainer(X)[source]¶: Compute the importance of each feature for the underlying regressor.

summary_plot(X, y, plot_type='dot')[source]¶: Visualizaion of the feature importance and feature effects.

force_plot(X, y)[source]¶: Interactive Javascript visualization of Shapley values.

dependence_plot(X, y, interaction_index='auto', alpha=None, dot_size=None)[source]¶: Visualization of a feature’s effect on a regressor’s prediction.

decision_plot(X, y)[source]¶: Visualization of the additive feature attribution.

_check_target_index(y)¶

Automates subtask slicing in multi-target regression.

Parameters: y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.
Returns: y
Return type: array-like of shape = [n_samples] or shape = [n_samples, n_targets]

_estimate_fold_size(y, cv)¶

Helper method to estimate cross-validation fold size.

Parameters

y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
cv (int, cross-validation generator, or an iterable) – Used in order to determine the fold size.

Returns

estimate

Return type

int

static _fit(regressor, X, y, sample_weight=None, **fit_params)¶

Helper fit method.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.
**fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise _fit_stages method.

classmethod _get_param_names()¶: Get parameter names for the estimator

_get_regressor()¶: Helper method which instantiates the regressor choice.

_modified_cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)¶

Performs (augmented) cross-validation.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – Array of scores for each run of the cross-validation procedure.

Return type

dict of float arrays of shape (n_splits,)

References

Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

property _repr_html_¶

HTML representation of estimator.

This is redundant with the logic of _repr_mimebundle_. The latter should be favorted in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner()¶: This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs)¶: Mime bundle used by jupyter kernels to display estimator

_validate_data(X=None, y=None)¶

Checks the validity of the data representation(s).

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).

Returns

out

Return type

validated data

property check_regressor¶

Checks if regressor adheres to scikit-learn conventions.

Namely, it runs sklearn.utils.estimator_checks.check_estimator.

cross_val_score(X, y, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)¶

Performs (augmented) cross-validation, then returns the withheld fold score.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – The withheld fold scores for each run of the cross-validation procedure.

Return type

pd.Series or pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

cross_validate(X, y, return_regressor=False, error_score=nan, return_incumbent_score=False, cv=None, fit_params=None)¶

Performs (augmented) cross-validation, and wraps the result in a DataFrame.

If return_incumbent_score is True, then the incumbent is scored on the withheld folds. Otherwise, the behavior is the same as in Scikit-learn.

Parameters

X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.

Returns

scores – DataFrame of scores for each run of the cross-validation procedure.

Return type

pd.DataFrame

Notes

Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.

References

Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

dump(value, filename)¶

Serializes the value with joblib.

Parameters

value (any Python object) – The object to store to disk.
filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.

Returns

filenames – The list of file names in which the data is stored.

Return type

list of str

get_params(deep=True)¶

Retrieves the (hyper)parameters.

Parameters: deep (bool, optional (default=True)) – Although we do not use this parameter, it is required as various Scikit-learn utilities require it.
Returns: self.params – (Hyper)parameter names mapped to their values.
Return type: dict

get_pipeline(y, n_quantiles=None)¶

Creates pipe attribute for downstream tasks.

This method constructs a ModifiedPipeline from the given base regressor.

Parameters

y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s). The targets are used to determine the type of the target, and the number of samples if the pipeline_transform involves quantile transformers.
n_quantiles (int or None, optional (default=None)) – Number of quantiles in sklearn.preprocessing.QuantileTransformer, if pipeline_transform is either `quantileuniform` or `quantilenormal`.

pipe¶

A ModifiedPipeline object.

Type: physlearn.pipeline.ModifiedPipeline

load(filename)¶

Deserializes the file object.

Parameters: filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.
Returns: joblib.load – The object stored in the file.
Return type: any Python object

predict(X)¶

Generates predictions with the ModifiedPipeline object.

Parameters: X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
Returns: y_pred – The predictions generated by the induced ModifiedPipeline object.
Return type: array-like of shape = [n_samples] or shape = [n_samples, n_targets]

regattr(attr)¶

Gets a regressor’s attribute from the ModifiedPipeline object.

The pipe attribute must exist in order to use this method.

Parameters: attr (str) – The name of the regressor’s attribute.
Returns: attr
Return type: type of attribute

score(y_true, y_pred, scoring='mse', multioutput='raw_values')¶

Computes the supervised score.

Parameters

y_true (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The observed target matrix, where each row corresponds to an example and the column(s) correspond to the observed single-target(s).
y_pred (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The predicted target matrix, where each row corresponds to an example and the column(s) correspond to the predicted single-target(s).
scoring (str, optional (default='mse')) – The scoring name, which may be mae, mse, rmse, r2, ev, or msle.
multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values, wherein the string must be either 'raw_values', 'uniform_average', or 'variance_weighted'.

Returns

score – The computed score.

Return type

float or ndarray of floats

set_params(**params)¶

Sets the regressor’s (hyper)parameters.

Parameters: **params (dict) – The regressor’s (hyper)parameters.
Returns: self – The base regressor object.
Return type: BaseRegressor

Table Of Contents

SHAP API¶