About Event Hub REST API discordancy respect to Docs

Jona 395 Reputation points
2024-05-07T00:29:30.93+00:00

Hi

I'm using Event Hub REST API to send data to a stream. This happens in a Functions with BlobStorage trigger.I feel a little confused with some terminology which I've not found the proper documentation to finally fully understand.

User's image

I built this dashboard to get insights of my experiment. Basically, I'm sending differents data formats via Event Hub. This includes CSV, CSV compresed, JSON... and may be later Parquet and AVRO. The trigger on the BlobStorage reacts to CSV file uploads. After Event Hubs, there is another Functions that read from it and send the quantity of message to Power BI Service Push dataset API.

The objective es to determine which orquestation format performs the best with good metrics.

So my questions are:

  • The metric Throttled Requests ¿ does it reject all the request? ¿or some of the events are sent properly and others not? I thinking about on single and batch events sendings
  • The metric Capture Backlog ¿What is it?

I've reading the following docs:

I'm working with Event Hub Standard 1 TU and I uploaded two CSV files: one of 2,1MB size and other of 1,2MB size

The response of the API is as follows:

User's image

The response code is 413, and the message tells nothing about throttling. More over, the throttling metric won't show up on Azure Monitor; instead, it looks like a correct API CALL

User's image

If I upload a file smaller than 1 MB (as it should be to run fine), there is obviuosly no message error:

User's image

Key note: the main objective of my experiment is to demostrate that If I send a CSV serialized (or compressed, also test it by me), I would be able to send more events per unit time. Look how I've been able to send 2.311 rows, far beyond the limit of 1000 events per publication. This is because the entire serialized CSV file is considered as one single event

¿Why doesn't the REST API respond with the proper code, as indicated in the above docs?

Besides, I've not wanted to use Event Hub as an output binding, because I don't see anyway to perform error handling. The output binding for Event Hub just waits for a value to be returned and nothing else .. I don't know where to locate error handling.

In the other hand, the SDK has have a poorly performnce (sending a CSV of 10KB takes almost 3 second, whereas via REST API it takes almost 2 seconds to deliver a almost 1MB file size)

I know is quite much, However I hope you can help me understand this behaviour.

Regards

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
568 questions
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 79,536 Reputation points Microsoft Employee
    2024-05-08T06:21:16.3133333+00:00

    @Jona - Thanks for the question and using MS Q&A platform.

    I understand that you have some questions regarding the Event Hub REST API and some of the metrics you are seeing in your experiment. Let me try to help you with that.

    Regarding your first question about the Throttled Requests metric, this metric indicates the number of requests that were rejected due to throttling. This means that some of the events may have been sent properly, but others were rejected. Throttling can occur when the number of requests exceeds the capacity of the Event Hub. This can happen with both single and batch event sendings.

    Regarding your second question about the Capture Backlog metric, this metric indicates the number of events that are waiting to be captured by the capture feature. The capture feature allows you to capture events from an Event Hub and store them in a storage account for further processing. The backlog can occur when the capture feature is not able to keep up with the incoming events.

    Regarding the response code you are seeing when uploading a file larger than 1 MB, the 413 response code indicates that the payload of the request is too large. This can happen when the size of the file you are trying to upload exceeds the maximum size allowed by the Event Hub. The throttling metric may not show up in Azure Monitor because the request was rejected due to the payload size, not due to throttling.

    Regarding your question about error handling with the Event Hub output binding, you can use the try-catch block to handle errors that occur when sending events to the Event Hub. If an error occurs, you can log the error or retry sending the event.

    Regarding the performance of the SDK versus the REST API, it is possible that the REST API is faster because it allows you to send larger payloads in a single request. The SDK may be slower because it sends smaller payloads in multiple requests.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


  2. PRADEEPCHEEKATLA-MSFT 79,536 Reputation points Microsoft Employee
    2024-05-14T04:11:26.07+00:00

    @Jona - When a request is throttled, the rejected events are not sent to the Event Hub. The event producer can retry sending the events after a certain amount of time has passed, or it can choose to discard the events if they are no longer needed.

    If the event producer chooses to retry sending the events, it should implement some form of backoff strategy to avoid overwhelming the Event Hub with too many requests. The backoff strategy should increase the time between retries for each failed attempt, to avoid flooding the Event Hub with too many requests.

    If the event consumer needs to perform message deduplication, it should use a unique identifier for each event, such as a message ID or a sequence number. The event consumer can then use this identifier to detect and discard duplicate events.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments