Google Data API

The physlearn.datasets.google.base module provides the abstract base class and the base class for representing data with pandas.

class physlearn.datasets.google.base.AbstractDataFrame[source]

Bases: ABC

Abstract base class for the supervised DataFrame.

abstract property get_df

Retrieves the DataFrame.

class physlearn.datasets.google.base.BaseDataFrame(path)[source]

Bases: AbstractDataFrame

Base class for the supervised DataFrame.

Parameters

path (str) – Path to the csv file with calibration data.

Examples

>>> from physlearn.datasets.google.base import BaseDataFrame
>>> from physlearn.datasets.google.utils._helper_functions import _path_to_google_data
>>> df = BaseDataFrame(path=_path_to_google_data())
>>> df.get_df.iloc[:, 2]
0     -0.008238
1     -0.008238
2     -0.008238
3     -0.008238
4     -0.008238
         ...
675    0.007770
676    0.007770
677    0.007770
678    0.007770
679    0.007770
Name: qubit_voltages, Length: 680, dtype: float64
property get_df

Reads a file into a pandas DataFrame.

Supports comma-separated values (csv), Excel, or JSON formatted files.

Returns

df

Return type

DataFrame

The physlearn.datasets.google._google module provides utilities for wrangling, serializing, and deserializing superconducting quantum computing calibration data.

Notes

The calibration data was collected by Benjamin Chiaro during his time as a graduate student at UC Santa Barbara. The Google quantum computer contains 9 qubits, wherein the 5 rightmost qubits and 4 interleaving couplers were utilized during experimentation. The 4 leftmost qubits and couplers were left idle during experimentation.

class physlearn.datasets.google._google.GoogleDataFrame(path, n_qubits)[source]

Bases: BaseDataFrame

Represents the Google quantum computer calibration data with a DataFrame.

Parameters
  • path (str) – Path to the csv file with calibration data.

  • n_qubits (int) – Number of qubits in the experiment.

See also

physlearn.datasets.GoogleData

Class for wrangling the calibration data.

Examples

>>> from physlearn.datasets import GoogleDataFrame
>>> from physlearn.datasets.google.utils._helper_functions import _path_to_google_data
>>> df = GoogleDataFrame(path=_path_to_google_data(), n_qubits=5)
>>> df.get_df_with_correct_columns.head().iloc[0, :3]
qvolt5   -0.008238
qvolt6   -0.006896
qvolt7   -0.026120
Name: 1, dtype: float64
property get_df_with_correct_columns

Drops the undesired columns from the raw calibration data.

Returns

df

Return type

DataFrame

property get_df

Reads a file into a pandas DataFrame.

Supports comma-separated values (csv), Excel, or JSON formatted files.

Returns

df

Return type

DataFrame

class physlearn.datasets.google._google.GoogleData(path=None, n_qubits=5, test_split=0.3, random_state=0, remove_outliers=False, shuffle=True)[source]

Bases: GoogleDataFrame

Wrangles the calibration data for multi-target regression.

Parameters
  • path (str, optional (default=None)) – Path to the csv file with calibration data.

  • n_qubits (int, optional (default=5)) – Number of qubits in the experiment. Currently, supports 5 qubits.

  • test_split (float, optional (default=0.3)) – The proportion of labeled examples withheld from training.

  • random_state (int, RandomState instance, or None, optional (default=0)) – Determines the random number generation in the training and test examples split.

  • remove_outliers (bool, optional (default=False)) – If True, then it removes labeled examples that are not within the interquartile range of the DataFrame.

  • shuffle (bool, optional (default=True)) – If True, then it shuffles the DataFrame rows prior to splitting the DataFrame into training and test examples.

See also

physlearn.datasets.GoogleDataFrame

Class for representing the calibration data.

Examples

>>> from physlearn.datasets import GoogleData
>>> data = GoogleData()
>>> data.load_benchmark['X_train'].iloc[0, :3]
qvolt5    0.003398
qvolt6   -0.018080
qvolt7   -0.009895
Name: 0, dtype: float64

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).

_train_test_split()[source]

Get the DataFrame, then split it into training and test data.

Returns

X_train, X_test, y_train, and y_test

Return type

DataFrame(s)

save_train_test_split_to_json()[source]

Serializes the training and test data as a JSON formatted stream.

It automatically dumps the data into the Google JSON folder.

property load_benchmark

Deserializes the benchmark dataset.

Returns

data

Return type

dict

property get_df

Reads a file into a pandas DataFrame.

Supports comma-separated values (csv), Excel, or JSON formatted files.

Returns

df

Return type

DataFrame

property get_df_with_correct_columns

Drops the undesired columns from the raw calibration data.

Returns

df

Return type

DataFrame

physlearn.datasets.google._google.load_benchmark(return_split=False)[source]

Deserializes the benchmark dataset for the multi-target regression task.

If the return split parameter is true, then the benchmark dataset is returned in the familiar X_train, X_test, y_train, and y_test format.

Parameters

return_split (bool) – If True, then the benchmark dataset is returned in the form of X_train, X_test, y_train, and y_test.

Returns

X_train, X_test, y_train, and y_test or data

Return type

DataFrame(s) or dict

References

  • Alex Wozniakowski, Jayne Thompson, Mile Gu, and Felix C. Binder. “A new formulation of gradient boosting”, Machine Learning: Science and Technology, 2 045022 (2021).