Transform Azure dataset to pandas dataframe

Shen, Shi 1 Reputation point

While I have some codes like:
dataset = Dataset.Tabular.from_delimited_files(
[(blob_datastore, pipeline_input_path)], separator=config.CSV_DELIMITER, set_column_types={"ID": DataType.to_string()}

For example, in column "ID", I have "01", "02" and I am trying to read "01", "02 "as string here.

Then my next step is:
df = dataset.to_panadas_dataframe()

However, it seems that "01" will always be read as 1 in this step when changing dataset to pandas dataframe.
Is there any way to keep "ID" as string and not be changed to integer here?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,059 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,326 Reputation points Microsoft Employee

    Hi @Shen, Shi ,

    Thank you for posting query in Microsoft Q&A Platform.

    It seems you are working with CSV file and CSV files by default double quotes are considered as Quote character hence when you preview data you dont see them.

    Could you please share little more details on below asks to understand issue better here?

    • What you mean by Azure dataset?
    • Where you have that sample code? Is it in Azure databricks? If yes, kindly share screenshots if possible?

    If your intention is reading csv file from blob storage, then you can directly do in Azure databricks. Below videos help you for same.
    Mount Azure Blob Storage to DBFS in Azure Databricks
    Access ADLS Gen2 or Blob Storage using a SAS token in Azure Databricks
    Access Data Lake Storage Gen2 or Blob Storage with an Azure service principal in Azure Databricks
    Read CSV file in to Dataframe using PySpark

    Once you get data in PySpark dataframe you can easily convert it to Pandas dataframe. Check below link for same.

    Hope this helps. Please let me know how it goes.


    Please consider hitting Accept Answer button. Accepted answers help community as well.