@Ritvij Sharma Thanks for your your reply.
.
I'm glad to see you were able to resolve your issue. Thanks for posting your solution so that others experiencing the same thing can easily reference this. Since the Microsoft Q&A community has a policy that the question author cannot accept their own answer, they can only accept answers by others, I'll repost your solution in case you'd like to Accept the answer.
.
Issue:
You are looking for a solution to get word-by-word durations using the Azure Speech SDK. You have mentioned that the word level timestamps are not accurate and you need exact word-by-word durations.
.
Resolution:
The correct answer to this, without having to do batch transcriptions and using the JS SDK itself, is to use pronunciation assessment. On activating pronunciation assessment, the results give the offset and duration for each and every word in the speech.
Related documentation to activate pronunciation assessment is here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-javascript