The physlearn.datasets.google.utils._dataset_helper_functions module
provides basic utilities for wrangling, serializing, and deserializing
superconducting quantum computing calibration data.
Serializes the training and test data dictionary as a JSON formatted stream.
train_test_data (dict) – A dictionary with keys: ‘X_train’, ‘X_test’, ‘y_train’, and ‘y_test’.
folder (str) – Directory in which the training and test data is dumped.
n_qubits (int or None, optional (default=None)) – Number of qubits. If specified, then this value is utilied in the file name.
Deserializes the training and test data dictionary.
The training and test data dictionary were serialized as a JSON formatted stream.
filename (str) – Name of the file in which the training and test data dictionary has been dumped.
train_test_data
dict
Splits the X and y data intro training and test data.
The split is determined by the fraction of the test size.
X (DataFrame or Series) – The design matrix, where each row corresponds to an example and the column(s) correspond to the feature(s).
y (DataFrame or Series) – The target matrix, where each row corresponds to an example and the column(s) correspond to the single-target(s).
test_size (float) – The decimal amount of test data.
random_state (int, RandomState instance or None.) – Determines random number generation in sklearn.model_selection.train_test_split.
train_test_data
dict
Notes
As shuffling is handled by sklearn.utils.shuffle, there is no shuffling parameter.
Shuffles the pandas data object.
data (DataFrame or Series) – The pandas data that is to be shuffled.
drop (bool) – Resets the index of the pandas data object.
pandas
DataFrame or Series
Computes the interquartile range, then it masks the outliers.
data (DataFrame or Series) – The pandas data that is to be masked.
pandas
DataFrame or Series