Date_Issue

Rohit Kulkarni 731 Reputation points
2023-04-25T17:22:39.82+00:00

Hello Team, I have two columns called (StartDate,NextStartDate) and dataypes for both column is "date".And the values are displaying like this

User's image

And trying to convert to datetime: df["StartDate"] = pd.to_datetime(df["StartDate"],unit='ms') # with time df["NextStartDate"] = pd.to_datetime(df["NextStartDate"],unit='ms') # with time And i am getting error : ValueError: non convertible value 0000-12-30 with the unit 'ms' Please advise how it can be converted to datetime Regards Rohit

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-04-27T07:53:28.45+00:00

    Hi Rohit Kulkarni ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, you are trying to convert date datatype columns into timestamp datatype in your dataframe. Please let me know if that is not the ask here.

    You can use to_timestamp() function to convert String to Timestamp (TimestampType) in PySpark. The converted time would be in a default format of MM-dd-yyyy HH:mm:ss.SSS

    Below is the code:

    df1=df.withColumn("StartDate_timestamp",to_timestamp('StartDate', "yyyy-MM-dd").cast("timestamp")).withColumn("NextStartDate_timestamp",to_timestamp('NextStartDate', "yyyy-MM-dd").cast("timestamp"))
    
    df1.show()
    

    Below is the screenshot of implementation. The datatype of the newly generated columns are timestamp as shown in the output of printschema() function below:

    User's image

    Hope it helps. Kindly revert back with any additional queries. Please accept the answer if it's helpful. Thankyou.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.