OpenDatasetBase Class
Open Dataset Base Class for inherit.
Construct open datasets.
- Inheritance
-
OpenDatasetBase
OpenDatasetBase(cols: List[str] | None = None, enable_telemetry: bool = True, **kwargs)
Name | Description |
---|---|
cols
|
A list of columns names to load from the dataset, defaults to None Default value: None
|
enable_telemetry
|
Whether to enable telemetry on this dataset, defaults to True Default value: True
|
kwargs
Required
|
args for filter |
get_file_dataset |
Get the file dataset for open dataset. |
get_tabular_dataset |
Initialize AbstractTabularOpenDataset with blob url. |
to_pandas_dataframe |
To pandas dataframe. |
to_spark_dataframe |
To spark dataframe. |
Get the file dataset for open dataset.
get_file_dataset(start_date: datetime = None, end_date: datetime = None, enable_telemetry: bool = True, **kwargs) -> FileDataset
Parameters
Name | Description |
---|---|
cls
Required
|
current class |
start_date
Required
|
start date, defaults to None |
end_date
Required
|
end date, defaults to None |
enable_telemetry
Required
|
enable telemetry or not, defaults to True |
Returns
Type | Description |
---|---|
file dataset |
Initialize AbstractTabularOpenDataset with blob url.
get_tabular_dataset(start_date: datetime = None, end_date: datetime = None, cols: List[str] = None, enable_telemetry: bool = True, **kwargs) -> TabularDataset
Parameters
Name | Description |
---|---|
cls
Required
|
type name of the Open Dataset. |
start_date
Required
|
The start date to query inclusively. |
end_date
Required
|
The end date to query inclusively. |
cols
Required
|
A list of column names to retrieve. None will get all columns. |
enable_telemetry
Required
|
Whether to enable telemetry, disabled for UT only. |
Returns
Type | Description |
---|---|
TabularDataset |
To pandas dataframe.
to_pandas_dataframe() -> DataFrame
To spark dataframe.
to_spark_dataframe()
Get the column name list to retrieve.
Get the data of the OpenDataset Object.
Get the location ID of the open data.
Get log properties.
Get the registry ID of this public dataset registered at the backend.
This registry ID is used to get latest metadata like storage location. Expect all public data sub classes to assign _registry_id.
Returns
Type | Description |
---|---|
Registry ID string. |
Time column name.