Reading avro file in Databricks

NIKHIL KUMAR 126 Reputation points
2023-07-06T08:39:12.52+00:00

How to read an .avro files stored in data lake using databricks.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2023-07-06T08:58:24.4666667+00:00

    @NIKHIL KUMAR - Thanks for the question and using MS Q&A platform.

    To read an .avro file stored in a data lake using Databricks, you can use the Databricks runtime's built-in support for reading and writing Avro files. Here are the steps to read an .avro file:

    • First, you need to mount the data lake storage account to Databricks. You can do this by following the instructions in the Databricks documentation: Connect to Azure Data Lake Storage Gen2 and Blob Storage
    • Once the data lake storage account is mounted, you can read the .avro file using the spark.read.format() method. Here is an example code snippet:
    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    
    # Define the schema of the .avro file
    schema = StructType([
      StructField("field1", StringType(), True),
      StructField("field2", IntegerType(), True),
      StructField("field3", DoubleType(), True)
    ])
    
    # Read the .avro file into a DataFrame
    df = spark.read.format("avro").schema(schema).load("/mnt/<mount-name>/<path-to-file>.avro")
    
    # Show the contents of the DataFrame
    df.show()
    
    

    In this example, replace <mount-name> with the name of the mount point you created in step 1, and <path-to-file> with the path to the .avro file in the data lake storage account.

    • Once you have read the .avro file into a DataFrame, you can perform any necessary transformations or analysis on the data using the DataFrame API. For more details, refer to Azure Databricks - Avro file.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.