Hi @Andrew Syrov ,
To answer your question, that's certainly a way to go. Think about it as the microservice orchestration pattern.
Your orchestration trigger will call an activity that's triggered by a service bus message. You can make the activity functions as broad or as granular you want but each one would modify the payload that will be returned to the orchestration trigger function to be passed to the next activity. If there's a failure within the activity, initiate a rollback state and requeue that payload back to the service bus. Your payload can either have a lastProcessStep or can be reprocessed as it's the first time.
That way, using example in the referenced doc, when catch (Exception)
occurs and you refund the account, take context.GetInput<TransferOperation>();
and requeue it for reprocessing. When it's reprocessed, that step will allow the payload to pass previous activities because it's already been updated or logging the payload in a queue or table for manual investigation and processing.
---
EDIT:
Case 1. If I'm not using Service Bus, (say, my durable function is triggered by HTTP trigger, or some other method, such as time trigger).
If you're using a HTTP or timer trigger to tell the function hey you, there's data to process then the same principles apply. Whatever data store you're using, the data being read should have some sort of indicator that can tell function where it last left off or reprocess the data where no changes for completed steps.
Case 2 (you somehow brought service bus into the picture, yet, let's discuss it):
I mentioned service bus and TTLExpiredException because you can leverage that trigger type to easily rerun your workflow in the event of a transient error. Your durable function shouldn't be designed in such a way they can run indefinitely. When the transfer from one back to another is initiated, that state can be recorded, and that operation considered complete. You then await for confirmation from the other bank that the transfer has been completed which triggers a separate activity to complete the process on your end. This way you can have a report of transactions that have been completed and awaiting confirmation and take decisive action.
I do not see any value in using a durable function with Service Bus
The advantage to using a service bus is the ability to use topics that your durable functions can subscribe to.
Using the above image as a guide for our banking example, the first row could be transfer-funds
topic and your df can have activities like fundAvailable, bankRegistered, accountVerfied, etc. which will add message to the second row. The second row can be regulatory
which is the logic for all rules necessary for transferring funds between institutions. The third row can be instituion-confirmation
where once the external bank was confirmed receipt of the funds, can call a separate function (not part of your durable function) will take the message and add it to that service hub topic.
At any point of a transient failure, the message will be placed back on the topic to be reprocessed. For instance, if it failed at bankRegistered, just rerun from the beginning. But let's say you made to accountVerified, your data store will have that transfer marked as pending and any reruns from there on will bypass those first activities. In the case of business logic failures, you can push those messages off to the dead-letter queue which a seaparate function is subscribed to that can alert/email/etc. that something went wrong in the process.
---
EDIT (2):
Have a look at Manage instances in Durable Functions. There are two options for detecting unhandled exceptions:
- Use instance query APIs to query the status and check for any failures
- Set up Event Grid notifications to receive notifications about failures.
The default configuration for durable function orchestration state is stored in Azure Storage which will remain there unless you purge it. Therefore, in the event you need to reprocess a message due to a failure, you can retrieve the data using one of the two above methods.
If your activity hits an unhandled exception, the retry policy should be used to keep trying that activity until it succeeds and compensate for any errors encountered during the retry. In the event of a process crash, the underlying framework will retry automatically. And since durable functions, and functions for that matter, is queue-based you must ensure that the code in each activity is idempotent since they may run more than once if the process crashed after the activity started executing but before the result was persisted. Each invocation of the durable function returns an instance GUID id that you can use for tracking.