Hello amit basu, Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your step is repeating in Bot on randomly production, and not on local.
Firstly, you will need to give deterministic evidence to decide if the same conversation is handled by different instances, whether state was saved, and whether validation failed. Correlate instanceId and dialogState to see instance switching. (See state management docs for state semantics - Microsoft Learn).
Secondly, confirm and fix storage config (this is most common root cause) to ensure ConversationState and UserState use a production-grade shared store (Cosmos, Azure Blob, Redis), and that storage objects are registered as singletons.
// index.js / app.js (Node)
const { ConversationState, UserState } = require('botbuilder');
const { CosmosDbPartitionedStorage } = require('botbuilder-azure');
const cosmosConfig = {
cosmosDbEndpoint: process.env.COSMOS_ENDPOINT,
authKey: process.env.COSMOS_KEY,
databaseId: process.env.COSMOS_DB,
containerId: process.env.COSMOS_CONTAINER
};
const storage = new CosmosDbPartitionedStorage(cosmosConfig);
const convoState = new ConversationState(storage);
const userState = new UserState(storage);
// register them once and reuse
adapter.use(new BotStateSet(convoState, userState));
- Confirm
process.env.*in production matches your secret store (KeyVault/CI pipeline). - Ensure the ConversationState object is singleton-scoped for your app. Bot Framework (C#)
// Startup.cs
var cosmosOptions = new CosmosDbPartitionedStorageOptions {
CosmosDBEndpoint = Configuration["COSMOS_ENDPOINT"],
AuthKey = Configuration["COSMOS_KEY"],
DatabaseId = Configuration["COSMOS_DB"],
ContainerId = Configuration["COSMOS_CONTAINER"]
};
IStorage storage = new CosmosDbPartitionedStorage(cosmosOptions);
var conversationState = new ConversationState(storage);
services.AddSingleton(conversationState);
services.AddSingleton<UserState>(new UserState(storage));
After deploying, instrument storageSaveStatus to ensure no transient failures.
Thirdly, you can handle optimistic concurrency (ETag / 412) gracefully, if you see HTTP 412 or “Precondition Failed” in logs, concurrent messages are colliding.
Pattern:
On save failure with 412 → read fresh state, reapply your update/merge, retry save (with exponential backoff). Don’t blindly overwrite.
Make updates idempotent where possible (use operation ids, message ids, or per-turn markers).
Pseudo algorithm:
attempt = 0
while attempt < maxRetries:
read state (includes ETag)
apply update (merge)
try save
success -> break
catch PreconditionFailed:
attempt += 1
sleep(backoff(attempt))
if not success -> log critical and surface to monitoring
Because, Cosmos and other stores use ETags for optimistic concurrency; handling 412 prevents corrupt dialog stacks and repeated prompts.
Lastly, you have to distinguish OAuth token loop vs state loop:
Add these checks in telemetry to classify each repeated-prompt incident:
- If prompt repeats but
validationResultshows the user input was valid and you seestorageSaveStatus: successanddialogStateadvanced, likely OAuth / token exchange failed (prompt reissued because token not found). ChecktokenStatusimmediately after prompt. Also inspect the Bot Framework user token endpoint call (GETapi/usertoken/GetToken), failures here indicate OAuth issues. - If prompt repeats and
dialogStateis unchanged orstorageSaveStatuswas an error / 412 orinstanceIdchanged between the prompt and the user response, likely state lost due to storage/instance routing or concurrency.
Then, implement a post-prompt validation step that logs token result and state snapshot. Example (JS):
// after receiving user submit
logger.info({ tokenFound: !!(await tokenClient.getUserToken(userId, connectionName)) });
logger.info({ dialogStateBefore: prevDialogState, dialogStateAfter: currentDialogState });
If you're using OAUTHPrompt:
- Verify ConnectionName and AppId/AppPassword in your Azure Bot registration and in your app settings, mismatched names cause intermittent token exchange failures.
- Channel differences: Web Chat/Direct Line may need magic code flow if token exchange isn't supported; Teams has its own SSO path. Check channel docs for token exchange support.
- Log the userToken/GetToken API call and its HTTP result for failed attempts — this pinpoints where the exchange broke. (Often you’ll see 401/404 or 400 with a message.)
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.