Share via

Is there a way to use query acceleration on .avro files

Filip Mattsson 0 Reputation points
2024-10-24T10:20:14.7966667+00:00

I have a data lake storage resource in which I am storing .avro files. I now have a need to transfer the data in those files to a database. I've managed to make that work. But I would love to optimize it by only downloading data I need. I read this article, it seems to imply that this querying in the data lake can only be done on JSON or CSV files. The way I'm parsing the files currently is using the .net Azure.Storage.Files.DataLake package to download and then using Apache.Avro to parse the contents. I tried to do this to query the data.

await fileSys.GetFileClient(file.Name).QueryAsync("SELECT * FROM BlobStorage WHERE _6 LIKE '%10001'");

But I get ContentLength = 0. Is there a way to make this work?

Azure Data Lake Storage
Azure Data Lake Storage

An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.

{count} votes

1 answer

Sort by: Most helpful
  1. Keshavulu Dasari 4,925 Reputation points Microsoft External Staff Moderator
    2024-10-24T18:46:25.45+00:00

    Hi Filip Mattsson,
    Welcome to Microsoft Q&A Forum, thank you for posting your query here!
    You are correct that Azure Data Lake Storage Query Acceleration currently only supports querying JSON and CSV files, your Avro files cannot be queried with this feature directly, it does not natively support .avro files for query acceleration

    Here are a few other ways you can consider to optimize your data migration:

    1. Pre-process Avro Files: Convert your Avro files to a supported format (JSON or CSV) before uploading them to Azure Data Lake Storage. This way, you can leverage query acceleration for efficient data retrieval.
    2. Custom Filtering Logic: Implement custom filtering logic in your application to process Avro files. You can download the necessary files and then apply your filtering criteria locally using Apache.Avro.
    3. Azure Data Factory: Use Azure Data Factory to orchestrate the transformation of Avro files into a queryable format. Data Factory can help automate the process of converting and loading data into your database.

    Databricks or Synapse Analytics: Utilize Azure Databricks or Synapse Analytics to read and process Avro files. These platforms offer robust support for Avro and can help you perform complex queries and transformations before loading the data into your database.
    For more information:
    https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration-how-to?tabs=azure-powershell

    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.