OpenDatasetBase Class

Open Dataset Base Class for inherit.

Construct open datasets.

Inheritance
OpenDatasetBase

Constructor

OpenDatasetBase(cols: List[str] | None = None, enable_telemetry: bool = True, **kwargs)

Parameters

cols
list[str]
default value: None

A list of columns names to load from the dataset, defaults to None

enable_telemetry
bool
default value: True

Whether to enable telemetry on this dataset, defaults to True

kwargs
dict
Required

args for filter

Methods

get_file_dataset

Get the file dataset for open dataset.

get_tabular_dataset

Initialize AbstractTabularOpenDataset with blob url.

to_pandas_dataframe

To pandas dataframe.

to_spark_dataframe

To spark dataframe.

get_file_dataset

Get the file dataset for open dataset.

get_file_dataset(start_date: datetime = None, end_date: datetime = None, enable_telemetry: bool = True, **kwargs) -> FileDataset

Parameters

cls
type
Required

current class

start_date
datetime
Required

start date, defaults to None

end_date
datetime
Required

end date, defaults to None

enable_telemetry
bool
Required

enable telemetry or not, defaults to True

Returns

file dataset

Return type

get_tabular_dataset

Initialize AbstractTabularOpenDataset with blob url.

get_tabular_dataset(start_date: datetime = None, end_date: datetime = None, cols: List[str] = None, enable_telemetry: bool = True, **kwargs) -> TabularDataset

Parameters

cls
type
Required

type name of the Open Dataset.

start_date
datetime
Required

The start date to query inclusively.

end_date
datetime
Required

The end date to query inclusively.

cols
list[str]
Required

A list of column names to retrieve. None will get all columns.

enable_telemetry
bool
Required

Whether to enable telemetry, disabled for UT only.

Returns

TabularDataset

Return type

to_pandas_dataframe

To pandas dataframe.

to_pandas_dataframe() -> DataFrame

to_spark_dataframe

To spark dataframe.

to_spark_dataframe()

Attributes

cols

Get the column name list to retrieve.

data

Get the data of the OpenDataset Object.

id

Get the location ID of the open data.

log_properties

Get log properties.

registry_id

Get the registry ID of this public dataset registered at the backend.

This registry ID is used to get latest metadata like storage location. Expect all public data sub classes to assign _registry_id.

Returns

Registry ID string.

Return type

str

time_column_name

Time column name.