The physlearn.supervised.interpretation.interpret_regressor
module provides SHAP utilities for regressor interpretability.
It includes the physlearn.ShapInterpret
class.
Bases: BaseRegressor
Interpret a regressor’s output with SHAP plots.
Visualizaion of the feature importance and feature effects.
Visualization of a feature’s effect on a regressor’s prediction.
Automates subtask slicing in multi-target regression.
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the
column(s) correspond to the single-target(s). The targets are used to
determine the type of the target, and the number of samples if the
pipeline_transform involves quantile transformers.
y
array-like of shape = [n_samples] or shape = [n_samples, n_targets]
Helper method to estimate cross-validation fold size.
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
cv (int, cross-validation generator, or an iterable) – Used in order to determine the fold size.
estimate
int
Helper fit method.
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
sample_weight (float, ndarray, or None, optional (default=None)) – Individual weights for each example. If the weight is a float, then every example will have the same weight.
**fit_params (dict of string -> object) – If base boosting, then these parameters are passed to the stagewise
_fit_stages method.
Get parameter names for the estimator
Helper method which instantiates the regressor choice.
Performs (augmented) cross-validation.
If return_incumbent_score is True, then the incumbent is scored
on the withheld folds. Otherwise, the behavior is the same as in
Scikit-learn.
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.
scores – Array of scores for each run of the cross-validation procedure.
dict of float arrays of shape (n_splits,)
References
Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).
HTML representation of estimator.
This is redundant with the logic of _repr_mimebundle_. The latter should be favorted in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.
This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].
Mime bundle used by jupyter kernels to display estimator
Checks the validity of the data representation(s).
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
out
validated data
Checks if regressor adheres to scikit-learn conventions.
Namely, it runs sklearn.utils.estimator_checks.check_estimator.
Performs (augmented) cross-validation, then returns the withheld fold score.
If return_incumbent_score is True, then the incumbent is scored
on the withheld folds. Otherwise, the behavior is the same as in
Scikit-learn.
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.
scores – The withheld fold scores for each run of the cross-validation procedure.
pd.Series or pd.DataFrame
Notes
Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.
References
Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).
Performs (augmented) cross-validation, and wraps the result in a DataFrame.
If return_incumbent_score is True, then the incumbent is scored
on the withheld folds. Otherwise, the behavior is the same as in
Scikit-learn.
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
return_regressor (bool, optional (default=False)) – Determines whether to return the induced regressor.
error_score ('raise' or numeric, optional (default=np.nan)) – The assigned value if an error occurs while inducing a regressor. If set to ‘raise’, then the specific error is raised. Else if set to a numeric value, then FitFailedWarning is raised.
return_incumbent_score (bool, optional (default=True)) – Determines whether to score the incumbent on the withheld folds, whereby the incumbent is assumed to be an example in the design matrix.
cv (int, cross-validation generator, an iterable, or None, optional (default=None)) – Determines the cross-validation strategy. If None, then the default is 5-fold cross-validation.
fit_params (dict, optional (default=None)) – (Hyper)parameters to pass to the regressor’s fit method.
scores – DataFrame of scores for each run of the cross-validation procedure.
pd.DataFrame
Notes
Scikit-learn returns negative scores for some metrics, such as mean absolute error (MAE) or mean squared error (MSE). However, we only return nonnegativie scores.
References
Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).
Serializes the value with joblib.
value (any Python object) – The object to store to disk.
filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.
filenames – The list of file names in which the data is stored.
list of str
Retrieves the (hyper)parameters.
deep (bool, optional (default=True)) – Although we do not use this parameter, it is required as various Scikit-learn utilities require it.
self.params – (Hyper)parameter names mapped to their values.
dict
Creates pipe attribute for downstream tasks.
This method constructs a ModifiedPipeline from the given base regressor.
y (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The target matrix, where each row corresponds to an example and the
column(s) correspond to the single-target(s). The targets are used to
determine the type of the target, and the number of samples if the
pipeline_transform involves quantile transformers.
n_quantiles (int or None, optional (default=None)) – Number of quantiles in sklearn.preprocessing.QuantileTransformer, if
pipeline_transform is either `quantileuniform` or `quantilenormal`.
A ModifiedPipeline object.
Deserializes the file object.
filename (str, joblib.pathlib.Path, or file object) – The file object or path of the file.
joblib.load – The object stored in the file.
any Python object
Generates predictions with the ModifiedPipeline object.
X (array-like of shape = [n_samples, n_features]) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y_pred – The predictions generated by the induced ModifiedPipeline object.
array-like of shape = [n_samples] or shape = [n_samples, n_targets]
Gets a regressor’s attribute from the ModifiedPipeline object.
The pipe attribute must exist in order to use this method.
attr (str) – The name of the regressor’s attribute.
attr
type of attribute
Computes the supervised score.
y_true (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The observed target matrix, where each row corresponds to an example and the column(s) correspond to the observed single-target(s).
y_pred (array-like of shape = [n_samples] or shape = [n_samples, n_targets]) – The predicted target matrix, where each row corresponds to an example and the column(s) correspond to the predicted single-target(s).
scoring (str, optional (default='mse')) – The scoring name, which may be mae, mse, rmse, r2, ev, or msle.
multioutput (str, optional (default='raw_values')) – Defines aggregating of multiple output values, wherein the string
must be either 'raw_values', 'uniform_average', or
'variance_weighted'.
score – The computed score.
float or ndarray of floats
Sets the regressor’s (hyper)parameters.
**params (dict) – The regressor’s (hyper)parameters.
self – The base regressor object.