Using Correlation Filters in Azure Service Bus for Message Filtering

suvra jyoti 156 Reputation points
2024-02-27T04:24:33.33+00:00

In Azure service bus i want to use correlation filters on subscriptions. The correlation filter should filter incoming messages on the basis of a entity id in a message. I am thinking of passing the Entity Id in Correlation Id of the message. There 10,000 Entities in total. The below are the best practices suggested here https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-performance-improvements?tabs=net-standard-sdk-2#:~:text=Compute%20considerations,entity%20to%20another)..) Mainly the below two i am concerned of:

  • Fanning out to multiple subscriptions on a single topic
  • Running many filters on a single subscription

My messaging using the service bus should enable near real time delivery. Given I should use minimum no of subscriptions and filters, The subscriptions and filters will be created by a subscription api that will have the necessary inputs like entity id(s), entity name, event type etc.

Should i use the below approach, where a set of entity id are there in IN expression?

Subscription 1:

  • Filter: CorrelationId IN ('1', '2', '3', '4', '5')

Subscription 2:

  • Filter: CorrelationId IN ('6', '7', '8', '9', '10')

and so on

or should i separate filters as below under subscriptions like below:

Subscription:

Filter1: CorrelationId = '1' Filter2: CorrelationId = '2' Filter3: CorrelationId = '3' Filter4: CorrelationId = '4' Filter5: CorrelationId = '5' and so on till 10,000 entity id (seems illogical)

or should i use a combination of both the above approaches?

Few points i have considered are as below:

  • For IN option(CorrelationId IN ('1', '2', '3', '4', '5')) The cost of choosing complex filter rules is lower overall message throughput at the subscription, topic, and namespace level, since evaluating rules costs compute time
         https://stackoverflow.com/questions/40265198/filtering-on-a-azure-service-bus-topic
      ```- For IN option, there are limitations on the size of the filter expression 
    
    
  • For IN option, Performance Impact: As the number of values increases, the performance impact of evaluating the filter might become significant.
  • For separate filter per entity id option, Each filter consumes resources on the Azure Service Bus service side, so having thousands of filters can increase resource consumption.

What is the correct approach to create subscriptions and filters for filtering the messages on the basis of entity id keeping in mind that message throughput is important to achieve near real time delivery?

Azure Service Bus
Azure Service Bus
An Azure service that provides cloud messaging as a service and hybrid integration.
700 questions
{count} votes

1 answer

Sort by: Most helpful
  1. JananiRamesh-MSFT 29,261 Reputation points
    2024-03-01T06:56:20.47+00:00

    @suvra jyotiThanks for reaching out. When it comes to filtering messages on the basis of entity ID in Service Bus, there are a few factors to consider in order to achieve near real-time delivery while minimizing the number of subscriptions and filters.

    Based on the information you provided, it seems that you have 10,000 entities and you want to filter incoming messages on the basis of entity ID. One approach you could consider is using the IN operator in the CorrelationId filter to group entity IDs into sets of 5 or 10, as you suggested in your first approach. This would allow you to use a smaller number of subscriptions and filters while still being able to filter messages based on entity ID.

    However, as you noted, there are limitations on the size of the filter expression when using the IN operator, and performance impact may become significant as the number of values increases. Therefore, you may want to consider breaking down the entity IDs into smaller groups and using multiple subscriptions and filters to handle them.

    For example, you could group the entity IDs into sets of 100 or 200 and create multiple subscriptions and filters to handle each set. This would allow you to distribute the load across multiple subscriptions and filters while still being able to filter messages based on entity ID.

    The best approach will depend on your specific requirements and constraints. You may want to consider testing different approaches and measuring their performance to determine which one works best for your use case.

    do let me know incase of further queries, I would be happy to assist you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.