AzureML Datetime Issue

Muhammad Zainal 31 Reputation points
2021-06-10T08:57:11.723+00:00

Hi,

I am coming across an issue to do with retaining the datetime values in the datasets that I have uploaded to AzureML.
This issue can be replicated in the following ways:

  1. Create a pandas dataframe with a column of datetime strings and parse them accordingly d = {"Date": ["2020-03-06", "2021-01-05", "2016-01-30", "2019-12-14"]}
    df = pd.DataFrame(data=d)
    df["Date"] = pd.to_datetime(df["Date"], format="%Y-%m-%d")

104079-image.png

  1. Save this dataframe as a .parquet
  2. Upload to Azure Blob
  3. Create a Tabular Dataset object with the uploaded file

datastore = workspace.get_default_datastore()
datastore_path = [(datastore, "filename.parquet")]
azureml_df = Dataset.Tabular.from_parquet_files(path=datastore_path)

Printing the dataframe results in the following:
104157-image.png
The datetime values are now different.
To investigate further, we can cast the datetime to int:
104181-image.png
which gives us a 15 digit number.

We also cast the original df to int:
104119-image.png
which instead gives us an 18 digit number.

This 18 digit number represents the number of nanoseconds since UNIX epoch. Three trailing zeroes are stripped from the number when creating the Tabular Dataset object through azureml-sdk, resulting in an incorrect datetime being read. Keep in mind that if you were to download the parquet from Azure Blob, the values are still intact, meaning the issue is with AzureML and potentially the Dataset method, from_parquet_files. A simple workaround would be to multiply this column by 1000 then convert it back to datetime again but I would like to know if there's something I'm missing in between reading the parquet from AzureML or if the problem is on Azure's side.

Regards,
Muhammad

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,334 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator
    2021-06-10T10:50:22.953+00:00

    @Muhammad Zainal Thanks for the detailed explanation of the issue. I have tried to replicate this issue with the exact steps but the date in does not change to a different value as seen in your case. Here are the steps:

    104253-image.png

    With the exact same steps too the date is consistent.

    104195-image.png

    Maybe there is an issue with one of the SDK version. Which version of the SDK are you using?

    1 person found this answer helpful.

  2. Hung Nguyen Thanh 16 Reputation points
    2022-01-19T08:22:28.737+00:00

    I'm facing the same problem. I'm using PythonScriptStep to create pipeline and PipelineData to get the output of the pipeline. The output has the right datetime, but once I registered that output data as a dataset in AzureML, the datetime is incorrect when I read it.

    166286-image.png

    166278-image.png

    As for the environment, I specified as below:

    pyarrow 3.0.0
    pandas 0.25.3
    azureml-core 1.34.0
    python 3.6.9

    1 person found this answer helpful.
    0 comments No comments

  3. Tka32 56 Reputation points
    2022-02-25T15:42:47.697+00:00

    I have the same issue when running the following code on the ML notebook.
    177840-screenshot-2022-02-25-at-163144.png

    It was fine until yesterday and suddenly started happening today.
    We can see the expected timestamp by converting it to datetime64 and then multiplying by 1000, but we would like know why it happens and don't want to have unexpected thing like this in the future or internally in the pipeline.
    Please investigate if something has changed within the aml dataset and casting the datatype, etc.

    1 person found this answer helpful.
    0 comments No comments

  4. Rian Finnegan 1 Reputation point
    2021-07-26T22:40:14.59+00:00

    I'm seeing the same issue running PythonScriptStep in AzureML Pipelines.

    I suspect it has something to do with the PyArrow representation of datetimes.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.