Edit

Share via


Develop reasoning apps with DeepSeek models on Azure AI Foundry using the OpenAI SDK

Learn how to use reasoning models like DeepSeek in Azure OpenAI with the OpenAI SDK for Python.

This article shows several best practices for integrating reasoning models:

  • Keyless authentication: Use managed identities or developer credentials instead of API keys.
  • Asynchronous operations: Use async features for better performance.
  • Streaming responses: Provide immediate feedback to users.
  • Reasoning separation: Separate reasoning steps from the final output.
  • Resource management: Clean up resources after use.

The DeepSeek building block

Explore the DeepSeek building block sample. It shows how to use the OpenAI client library to call the DeepSeek-R1 model and generate responses to user messages.

Architectural overview

The following diagram shows the simple architecture of the sample app: Diagram showing architecture from client to backend app.

The chat app runs as an Azure Container App. The app uses managed identity with Microsoft Entra ID to authenticate with Azure OpenAI instead of an API key. The app uses Azure OpenAI to generate responses to user messages.

The app relies on these services and components:

  • A Python Quart app that uses the OpenAI client library package to generate responses to user messages
  • A basic HTML/JS frontend that streams responses from the backend using JSON Lines over a ReadableStream
  • Bicep files for provisioning Azure resources, including Azure AI Services, Azure Container Apps, Azure Container Registry, Azure Log Analytics, and RBAC roles.

Cost

To keep costs low, this sample uses basic or consumption pricing tiers for most resources. Adjust the tier as needed, and delete resources when you're done to avoid charges.

Learn more about cost in the sample repo.

Prerequisites

A development container includes all the dependencies you need for this article. You can run it in GitHub Codespaces (in a browser) or locally using Visual Studio Code.

To follow this article, make sure you meet these prerequisites:

  • An Azure subscription – Create one for free
  • Azure account permissions – Your Azure account must have Microsoft.Authorization/roleAssignments/write permissions, such as Role Based Access Control Administrator, User Access Administrator, or Owner. If you don't have subscription-level permissions, you must be granted RBAC for an existing resource group and deploy to that group.
    • Your Azure account also needs Microsoft.Resources/deployments/write permissions at the subscription level.
  • GitHub account

Open development environment

Follow these steps to set up a preconfigured development environment with all the required dependencies.

GitHub Codespaces runs a development container managed by GitHub with Visual Studio Code for the Web as the interface. Use GitHub Codespaces for the simplest setup, as it comes with the necessary tools and dependencies preinstalled for this article.

Important

All GitHub accounts can use Codespaces for up to 60 hours free each month with two core instances. For more information, see GitHub Codespaces monthly included storage and core hours.

Use the following steps to create a new GitHub Codespace on the main branch of the Azure-Samples/deepseek-python GitHub repository.

  1. Right-click the following button and select Open link in new window. This action lets you have the development environment and the documentation open side by side.

    Open in GitHub Codespaces

  2. On the Create codespace page, review and then select Create new codespace

  3. Wait for the codespace to start. It might take a few minutes.

  4. Sign in to Azure with the Azure Developer CLI in the terminal at the bottom of the screen.

    azd auth login
    
  5. Copy the code from the terminal and then paste it into a browser. Follow the instructions to authenticate with your Azure account.

You do the rest of the tasks in this development container.

Deploy and run

The sample repository has all the code and configuration files you need to deploy the chat app to Azure. Follow these steps to deploy the chat app to Azure.

Deploy chat app to Azure

Important

Azure resources created in this section start costing money immediately. These resources might still incur costs even if you stop the command before it finishes.

  1. Run the following Azure Developer CLI command for Azure resource provisioning and source code deployment:

    azd up
    
  2. Use the following table to answer the prompts:

    Prompt Answer
    Environment name Keep it short and lowercase. Add your name or alias. For example, chat-app. It's used as part of the resource group name.
    Subscription Select the subscription to create the resources in.
    Location (for hosting) Select a location near you from the list.
    Location for the DeepSeek model Select a location near you from the list. If the same location is available as your first location, select that.
  3. Wait for the app to deploy. Deployment usually takes 5 to 10 minutes.

Use chat app to ask questions to the large language model

  1. After deployment, the terminal shows a URL.

  2. Select the URL labeled Deploying service web to open the chat app in your browser.

    Screenshot of chat app in browser with a question in the chat text box along with the response.

  3. In the browser, ask a question about the uploaded image such as "Who painted the Mona Lisa?"

  4. Azure OpenAI provides the answer through model inference, and the result appears in the app.

Exploring the sample code

OpenAI and Azure OpenAI Service both use the common Python client library, but you need to make a few small code changes for Azure OpenAI endpoints. This sample uses a DeepSeek-R1 reasoning model to generate responses in a simple chat app.

Setup and authentication

The src\quartapp\chat.py file starts with setup and configuring keyless authentication.

Infrastructure setup

The script uses Quart, an async web framework, to create a Blueprint named chat. This Blueprint defines the app's routes and manages its lifecycle hooks.

bp = Blueprint("chat", __name__, template_folder="templates", static_folder="static")

The Blueprint defines the / and /chat/stream routes and the @bp.before_app_serving and @bp.after_app_serving lifecycle hooks.

Initialization with keyless authentication

The following code snippet handles authentication.

Note

The @bp.before_app_serving hook initializes the OpenAI client and handles authentication. This approach is critical for securely accessing Azure-hosted DeepSeek-R1 models.

The authentication strategy adapts to the environment:

  • In production: Uses Managed Identity Credential with an Azure client ID to avoid storing sensitive keys. This method is secure and scalable for cloud-native apps.
  • In development: Uses Azure Developer CLI Credential with an Azure tenant ID to simplify local testing by using the developer's Azure CLI sign-in session.
@bp.before_app_serving
async def configure_openai():
    if os.getenv("RUNNING_IN_PRODUCTION"):
        client_id = os.environ["AZURE_CLIENT_ID"]
        bp.azure_credential = ManagedIdentityCredential(client_id=client_id)
    else:
        tenant_id = os.environ["AZURE_TENANT_ID"]
        bp.azure_credential = AzureDeveloperCliCredential(tenant_id=tenant_id)

This keyless authentication approach provides:

  • Better security: No API keys stored in code or environment variables.
  • Easier management: No need to rotate keys or manage secrets.
  • Smooth transitions: The same code works in both development and production.

Token provider setup

In the following code snippet, the token provider creates a bearer token to authenticate requests to Azure OpenAI services. It automatically generates and refreshes these tokens using the configured credential.

bp.openai_token_provider = get_bearer_token_provider(
    bp.azure_credential, "https://cognitiveservices.azure.com/.default"
)

Azure OpenAI client configuration

There are two possible clients, AzureOpenAI and AsyncAzureOpenAI. The following code snippet uses AsyncAzureOpenAI along with the asynchronous Quart framework for better performance with concurrent users:

bp.openai_client = AsyncAzureOpenAI(
    azure_endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    azure_ad_token_provider=openai_token_provider,
    api_version="2024-10-21",
  • base_url: Points to the Azure-hosted DeepSeek inference endpoint
  • api_key: Uses a dynamically generated API key from the token provider.
  • api-version: Specifies the API version supporting DeepSeek models

Model deployment name configuration

The following code snippet sets the DeepSeek model version by getting the deployment name from your environment configuration. It assigns the name to the bp.model_deployment_name variable, making it accessible throughout the app. This approach lets you change the model deployment without updating the code.

bp.model_deployment_name = os.getenv("AZURE_DEEPSEEK_DEPLOYMENT")

Note

In Azure OpenAI, you don't directly use model names like gpt-4o or deepseek-r1. Instead, you create deployments, which are named instances of models in your Azure OpenAI resource. This approach offers the following benefits:

  • Abstraction: Keeps deployment names out of the code by using environment variables.
  • Flexibility: Lets you switch between different DeepSeek deployments without changing the code.
  • Environment-specific configuration: Allows using different deployments for development, testing, and production.
  • Resource management: Each Azure deployment has its own quota, throttling, and monitoring.

Lifecycle management

The following code snippet prevents resource leaks by closing the asynchronous Azure OpenAI client when the application shuts down. The @bp.after_app_serving hook ensures proper cleanup of resources.

@bp.after_app_serving
async def shutdown_openai():
    await bp.openai_client.close()

Chat handler streaming function

The chat_handler() function manages user interactions with the DeepSeek-R1 model through the chat/stream route. It streams responses back to the client in real time and processes them. The function extracts messages from the JSON payload.

Streaming implementation

  1. The response_stream function starts by accepting messages from the client.

    • request_messages: The route expects a JSON payload containing user messages.
    @bp.post("/chat/stream")
    async def chat_handler():
       request_messages = (await request.get_json())["messages"]
    
  2. Next, the function streams responses from the OpenAI API. It combines system messages like "You're a helpful assistant" with user-provided messages.

    @stream_with_context
    async def response_stream():
        all_messages = [
            {"role": "system", "content": "You are a helpful assistant."},
        ] + request_messages
    
  3. Next, the function creates a streaming chat completion request.

    The chat.completions.create method sends the messages to the DeepSeek-R1 model. The stream=True parameter enables real-time response streaming.

      chat_coroutine = bp.openai_client.chat.completions.create(
          model=bp.openai_model,
          messages=all_messages,
          stream=True,
      )
    
  4. The following code snippet processes streaming responses from the DeepSeek-R1 model and handles errors. It iterates through updates, checks for valid choices, and sends each response chunk as JSON Lines. If an error occurs, it logs the error and sends a JSON error message to the client while continuing the stream.

    try:
        async for update in await chat_coroutine:
            if update.choices:
                yield update.choices[0].model_dump_json() + "\n"
        except Exception as e:
            current_app.logger.error(e)
            yield json.dumps({"error": str(e)}, ensure_ascii=False) + "\n"
    
    return Response(response_stream())
    

Reasoning content handling

While traditional language models only provide final outputs, reasoning models like DeepSeek-R1 show their intermediate reasoning steps. These steps make them useful for:

  • Solving complex problems
  • Performing mathematical calculations
  • Handling multi-step logical reasoning
  • Making transparent decisions

The submit event handler in index.html processes the streaming response on the frontend. This approach lets you access and display the model's reasoning steps alongside the final output.

The frontend uses a ReadableStream to process streaming responses from the backend. It separates reasoning content from regular content, showing reasoning in an expandable section, and the final answer in the main chat area.

Step-by-step breakdown

  1. Initiate streaming request

    This code snippet creates a connection between the JavaScript frontend and the Python backend, enabling DeepSeek-R1's Azure OpenAI integration with keyless authentication.

    const response = await fetch("/chat/stream", {
        method: "POST",
        headers: {"Content-Type": "application/json"},
        body: JSON.stringify({messages: messages})
    });
    
  2. Initialize variables

    The following code snippet initializes variables to store the answer and thoughts separately. This separation helps handle reasoning content effectively.

    let answer = "";
    let thoughts = "";    
    
  3. Process each update

    The following code snippet asynchronously iterates through chunks of the model's response.

    for await (const event of readNDJSONStream(response.body)) {
    
  4. Detect and route content type

    The script checks if the event contains a delta field. If it does, it processes the content based on whether it's reasoning content or regular content.

    if (!event.delta) {
         continue;
    }
    if (event.delta.reasoning_content) {
         thoughts += event.delta.reasoning_content;
         if (thoughts.trim().length > 0) {
             // Only show thoughts if they are more than just whitespace
             messageDiv.querySelector(".loading-bar").style.display = "none";
             messageDiv.querySelector(".thoughts").style.display = "block";
             messageDiv.querySelector(".thoughts-content").innerHTML = converter.makeHtml(thoughts);
         }
     } else if (event.delta.content) {
         messageDiv.querySelector(".loading-bar").style.display = "none";
         answer += event.delta.content;
         messageDiv.querySelector(".answer-content").innerHTML = converter.makeHtml(answer);
     }
    
    • If the content type is reasoning_content, the content is added to thoughts and displayed in the .thoughts-content section.
    • If the content type is content, the content is added to answer and displayed in the .answer-content section.
    • The .loading-bar is hidden once content starts streaming, and the .thoughts section is displayed if there are any thoughts.
  5. Error handling:

    Errors are logged in the backend and returned to the client in JSON format.

    except Exception as e:
        current_app.logger.error(e)
        yield json.dumps({"error": str(e)}, ensure_ascii=False) + "\n"
    

    This frontend code snippet displays the error message in the chat interface.

    messageDiv.scrollIntoView();
    if (event.error) {
        messageDiv.innerHTML = "Error: " + event.error;
    }
    

Clean up GitHub Codespaces

Delete the GitHub Codespaces environment to maximize your free per-core hours.

Important

For more information about your GitHub account's free storage and core hours, see GitHub Codespaces monthly included storage and core hours.

  1. Sign into the GitHub Codespaces dashboard.

  2. Find your active Codespaces created from the Azure-Samples//deepseek-python GitHub repository.

  3. Open the context menu for the codespace and select Delete.

Get help

Log your issue to the repository's Issues.

Next steps