The physlearn.datasets.google.base module provides the abstract
base class and the base class for representing data with pandas.
Bases: ABC
Abstract base class for the supervised DataFrame.
Retrieves the DataFrame.
Bases: AbstractDataFrame
Base class for the supervised DataFrame.
path (str) – Path to the csv file with calibration data.
Examples
>>> from physlearn.datasets.google.base import BaseDataFrame
>>> from physlearn.datasets.google.utils._helper_functions import _path_to_google_data
>>> df = BaseDataFrame(path=_path_to_google_data())
>>> df.get_df.iloc[:, 2]
0 -0.008238
1 -0.008238
2 -0.008238
3 -0.008238
4 -0.008238
...
675 0.007770
676 0.007770
677 0.007770
678 0.007770
679 0.007770
Name: qubit_voltages, Length: 680, dtype: float64
Reads a file into a pandas DataFrame.
Supports comma-separated values (csv), Excel, or JSON formatted files.
df
DataFrame
The physlearn.datasets.google._google module provides utilities
for wrangling, serializing, and deserializing superconducting quantum
computing calibration data.
Notes
The calibration data was collected by Benjamin Chiaro during his time as a graduate student at UC Santa Barbara. The Google quantum computer contains 9 qubits, wherein the 5 rightmost qubits and 4 interleaving couplers were utilized during experimentation. The 4 leftmost qubits and couplers were left idle during experimentation.
Bases: BaseDataFrame
Represents the Google quantum computer calibration data with a DataFrame.
path (str) – Path to the csv file with calibration data.
n_qubits (int) – Number of qubits in the experiment.
See also
physlearn.datasets.GoogleDataClass for wrangling the calibration data.
Examples
>>> from physlearn.datasets import GoogleDataFrame
>>> from physlearn.datasets.google.utils._helper_functions import _path_to_google_data
>>> df = GoogleDataFrame(path=_path_to_google_data(), n_qubits=5)
>>> df.get_df_with_correct_columns.head().iloc[0, :3]
qvolt5 -0.008238
qvolt6 -0.006896
qvolt7 -0.026120
Name: 1, dtype: float64
Drops the undesired columns from the raw calibration data.
df
DataFrame
Reads a file into a pandas DataFrame.
Supports comma-separated values (csv), Excel, or JSON formatted files.
df
DataFrame
Bases: GoogleDataFrame
Wrangles the calibration data for multi-target regression.
path (str, optional (default=None)) – Path to the csv file with calibration data.
n_qubits (int, optional (default=5)) – Number of qubits in the experiment. Currently, supports 5 qubits.
test_split (float, optional (default=0.3)) – The proportion of labeled examples withheld from training.
random_state (int, RandomState instance, or None, optional (default=0)) – Determines the random number generation in the training and test examples split.
remove_outliers (bool, optional (default=False)) – If True, then it removes labeled examples that are not within the interquartile range of the DataFrame.
shuffle (bool, optional (default=True)) – If True, then it shuffles the DataFrame rows prior to splitting the DataFrame into training and test examples.
See also
physlearn.datasets.GoogleDataFrameClass for representing the calibration data.
Examples
>>> from physlearn.datasets import GoogleData
>>> data = GoogleData()
>>> data.load_benchmark['X_train'].iloc[0, :3]
qvolt5 0.003398
qvolt6 -0.018080
qvolt7 -0.009895
Name: 0, dtype: float64
References
Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).
Get the DataFrame, then split it into training and test data.
X_train, X_test, y_train, and y_test
DataFrame(s)
Serializes the training and test data as a JSON formatted stream.
It automatically dumps the data into the Google JSON folder.
Deserializes the benchmark dataset.
data
dict
Reads a file into a pandas DataFrame.
Supports comma-separated values (csv), Excel, or JSON formatted files.
df
DataFrame
Drops the undesired columns from the raw calibration data.
df
DataFrame
Deserializes the benchmark dataset for the multi-target regression task.
If the return split parameter is true, then the benchmark dataset is returned in the familiar X_train, X_test, y_train, and y_test format.
return_split (bool) – If True, then the benchmark dataset is returned in the form of X_train, X_test, y_train, and y_test.
X_train, X_test, y_train, and y_test or data
DataFrame(s) or dict
References
Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).