What is best/least expensive way to automatically transfer files from Event Hub into a text file in either a data Blob or Data Lake for later use in Excel or Power BI?

Nick Nason 0 Reputation points
2023-09-28T16:10:45.51+00:00

We're pulling data from the field every 15 minutes using Azure Event Hub and need to store it in a format that we can read, and later use to create reports or import into a database. Looks like there are multiple ways of doing it, and I would like to know the best and least expensive way to accomplish this. When using the capture events, which is straightforward, it puts the data file into an .avro format that is not easily readable.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,473 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,859 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
641 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,646 Reputation points Microsoft Employee
    2023-09-29T05:24:37.0766667+00:00

    @Nick Nason - Thanks for the question and using MS Q&A platform.

    There are multiple ways to transfer files from Azure Event Hub to a text file in either a data Blob or Data Lake. Here are a few options that you can consider:

    1. Use Azure Stream Analytics: You can use Azure Stream Analytics to read data from Azure Event Hub and write it to a text file in either a data Blob or Data Lake. Stream Analytics provides a simple and cost-effective way to process and analyze streaming data in real-time. You can create a Stream Analytics job that reads data from the Event Hub, transforms it as needed, and writes it to a text file in Blob or Data Lake. Stream Analytics supports multiple output formats, including CSV, JSON, and Avro.
    2. Use Azure Functions: You can use Azure Functions to read data from Azure Event Hub and write it to a text file in either a data Blob or Data Lake. Azure Functions is a serverless compute service that allows you to run code on-demand without having to manage any infrastructure. You can create an Azure Function that reads data from the Event Hub, transforms it as needed, and writes it to a text file in Blob or Data Lake. Azure Functions supports multiple programming languages, including C#, Java, JavaScript, and Python.
    3. Use Azure Logic Apps: You can use Azure Logic Apps to read data from Azure Event Hub and write it to a text file in either a data Blob or Data Lake. Azure Logic Apps is a cloud-based service that allows you to create workflows that integrate with various Azure services and third-party services. You can create a Logic App that reads data from the Event Hub, transforms it as needed, and writes it to a text file in Blob or Data Lake. Logic Apps provides a visual designer that allows you to create workflows without having to write any code.

    All of the above options have their own pros and cons, and the best option for you depends on your specific requirements and constraints. In terms of cost, Azure Functions and Logic Apps are generally more cost-effective than Stream Analytics, especially for small to medium workloads. However, Stream Analytics provides more advanced features and scalability for larger workloads.

    To convert the .avro format to a readable text format, you can use a tool like Apache Avro Tools, which provides command-line tools for working with Avro files. You can use the tojson command to convert the .avro file to a JSON file, and then use a tool like Azure Data Factory to copy the JSON file to a text file in Blob or Data Lake.

    Here are the high-level steps to accomplish this:

    Create an Azure Function or Azure Logic App that reads data from Azure Event Hub and writes it to an .avro file in Blob or Data Lake.

    Use Apache Avro Tools to convert the .avro file to a JSON file. You can do this by running the following command:

    java -jar avro-tools-1.10.2.jar tojson <input-file.avro> > <output-file.json>
    

    Replace <input-file.avro> with the name of the .avro file and <output-file.json> with the name of the JSON file.

    Use Azure Data Factory to copy the JSON file to a text file in Blob or Data Lake. You can create a pipeline in Azure Data Factory that reads the JSON file and writes it to a text file in Blob or Data Lake. Azure Data Factory provides built-in connectors for Blob and Data Lake, which makes it easy to copy data between these services.

    This approach should be cost-effective and relatively straightforward to implement. However, keep in mind that the performance and scalability of this approach may be limited by the processing power and memory of the Azure Function or Logic App. If you need to process large volumes of data or require more advanced features, you may need to consider other options like Azure Stream Analytics or Azure Databricks.

    For more details, refer to https://learn.microsoft.com/en-us/azure/event-hubs/explore-captured-avro-files

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.