Step repeating in bot on randomly production not even single time on local

Question

Step repeating in bot on randomly production not even single time on local

amit basu 0

In my bot a security id asking card is shown to the user, even when the user fill the correct security id it again asks for security id and this happens randomly only on production (sometimes works fine) not on local

is there any setting I have to change

Nikhil Jha (Accenture International Limited) 4,150 Reputation points Microsoft External Staff Moderator

2025-12-01T07:39:16.9533333+00:00

Hello amit basu,
Following up to know if you had a chance to review the response. Please "Accept the answer and Upvote" to help other community members looking for similar remediation.

2 answers

Your answer

Nikhil Jha (Accenture International Limited) 4,150 Reputation points Microsoft External Staff Moderator

2025-12-01T07:39:16.9533333+00:00

Hello amit basu,
Following up to know if you had a chance to review the response. Please "Accept the answer and Upvote" to help other community members looking for similar remediation.

Answer 1

Hello amit basu, Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your step is repeating in Bot on randomly production, and not on local.

Firstly, you will need to give deterministic evidence to decide if the same conversation is handled by different instances, whether state was saved, and whether validation failed. Correlate instanceId and dialogState to see instance switching. (See state management docs for state semantics - Microsoft Learn).

Secondly, confirm and fix storage config (this is most common root cause) to ensure ConversationState and UserState use a production-grade shared store (Cosmos, Azure Blob, Redis), and that storage objects are registered as singletons.

// index.js / app.js (Node)
const { ConversationState, UserState } = require('botbuilder');
const { CosmosDbPartitionedStorage } = require('botbuilder-azure');

const cosmosConfig = {
  cosmosDbEndpoint: process.env.COSMOS_ENDPOINT,
  authKey: process.env.COSMOS_KEY,
  databaseId: process.env.COSMOS_DB,
  containerId: process.env.COSMOS_CONTAINER
};
const storage = new CosmosDbPartitionedStorage(cosmosConfig);
const convoState = new ConversationState(storage);
const userState = new UserState(storage);
// register them once and reuse
adapter.use(new BotStateSet(convoState, userState));

Confirm process.env.* in production matches your secret store (KeyVault/CI pipeline).
Ensure the ConversationState object is singleton-scoped for your app. Bot Framework (C#)

// Startup.cs
var cosmosOptions = new CosmosDbPartitionedStorageOptions {
   CosmosDBEndpoint = Configuration["COSMOS_ENDPOINT"],
   AuthKey = Configuration["COSMOS_KEY"],
   DatabaseId = Configuration["COSMOS_DB"],
   ContainerId = Configuration["COSMOS_CONTAINER"]
};
IStorage storage = new CosmosDbPartitionedStorage(cosmosOptions);
var conversationState = new ConversationState(storage);
services.AddSingleton(conversationState);
services.AddSingleton<UserState>(new UserState(storage));

After deploying, instrument storageSaveStatus to ensure no transient failures.

Thirdly, you can handle optimistic concurrency (ETag / 412) gracefully, if you see HTTP 412 or “Precondition Failed” in logs, concurrent messages are colliding.

Pattern:

On save failure with 412 → read fresh state, reapply your update/merge, retry save (with exponential backoff). Don’t blindly overwrite.

Make updates idempotent where possible (use operation ids, message ids, or per-turn markers).

Pseudo algorithm:

attempt = 0
while attempt < maxRetries:
  read state (includes ETag)
  apply update (merge)
  try save
    success -> break
  catch PreconditionFailed:
    attempt += 1
    sleep(backoff(attempt))
if not success -> log critical and surface to monitoring

Because, Cosmos and other stores use ETags for optimistic concurrency; handling 412 prevents corrupt dialog stacks and repeated prompts.

Lastly, you have to distinguish OAuth token loop vs state loop:

Add these checks in telemetry to classify each repeated-prompt incident:

If prompt repeats but validationResult shows the user input was valid and you see storageSaveStatus: success and dialogState advanced, likely OAuth / token exchange failed (prompt reissued because token not found). Check tokenStatus immediately after prompt. Also inspect the Bot Framework user token endpoint call (GET api/usertoken/GetToken), failures here indicate OAuth issues.
If prompt repeats and dialogState is unchanged or storageSaveStatus was an error / 412 or instanceId changed between the prompt and the user response, likely state lost due to storage/instance routing or concurrency.

Then, implement a post-prompt validation step that logs token result and state snapshot. Example (JS):

// after receiving user submit
logger.info({ tokenFound: !!(await tokenClient.getUserToken(userId, connectionName)) });
logger.info({ dialogStateBefore: prevDialogState, dialogStateAfter: currentDialogState });

If you're using OAUTHPrompt:

Verify ConnectionName and AppId/AppPassword in your Azure Bot registration and in your app settings, mismatched names cause intermittent token exchange failures.
Channel differences: Web Chat/Direct Line may need magic code flow if token exchange isn't supported; Teams has its own SSO path. Check channel docs for token exchange support.
Log the userToken/GetToken API call and its HTTP result for failed attempts — this pinpoints where the exchange broke. (Often you’ll see 401/404 or 400 with a message.)

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Answer 2

Hello amit basu,

I understand you are facing a critical issue where your bot gets stuck in a loop asking for a "security ID" in your production environment, despite working perfectly in your local environment.

This specific symptom—"works locally, loops randomly in production"—is almost always caused by how your bot manages State.

In the Bot Framework, the bot must "remember" which step of the dialog it is currently on. It does this by saving the state at the end of every turn.

Locally: You are likely using MemoryStorage. Since your local machine is a single server that never restarts or scales during testing, the state is always preserved in the process memory.
Production: If you use MemoryStorage in Azure, or if your persistent storage is misconfigured, the bot suffers.
1. User sends Security ID.
2. Bot processes it, moves the dialog pointer to the next step, and tries to save this new state.
3. Failure: If the state is stored in memory and the App Service restarts (or scales to a different instance, or the load balancer routes the next request to a different node), that memory is wiped or inaccessible.
4. Result: On the next turn, the bot loads the old state (where it was waiting for the ID) and asks the question again.

I would recommend below step to try (Most Likely Cause):

1: Verify You Are NOT Using MemoryStorage in Production

Check your startup code (Startup.cs for .NET or index.js/adapter.js for Node.js).

If you see new MemoryStorage(), this is the problem.
In-memory storage is volatile and not intended for production workloads. In production, you must use persistent storage like Azure Blob Storage or Cosmos DB.
Action: Update your production configuration to use AzureBlobStorage or CosmosDbPartitionedStorage.

2: Check for Concurrency/Race Conditions

If the user (or the client channel) sends the Security ID twice quickly (e.g., a retry due to network lag), the bot might process both messages in parallel.

Request A reads state (Step 1).
Request B reads state (Step 1).
Request A saves state (Step 2).
Request B saves state (Step 2 overwriting Request A).
Action: Check your Application Insights logs for 412 Precondition Failed errors. This indicates an ETag conflict where the bot tried to update state that had already changed.

Official Reference:

If this answer helps, kindly "Accept the answer and upvote" to help other community members.

amit basu 0 Reputation points

2025-12-01T13:30:36.36+00:00

I am not using MemoryStorage, is it possible that this package version is creating the problem
Nikhil Jha (Accenture International Limited) 4,150 Reputation points Microsoft External Staff Moderator

2025-12-02T04:00:51.96+00:00

If possible, try with different package version.

Share via

Step repeating in bot on randomly production not even single time on local

2 answers

Your answer