Load testing Python chat app using RAG with Locust
This article provides the process to perform load testing on a Python chat application using the RAG pattern with Locust, a popular open-source load testing tool. The primary objective of load testing is to ensure that the expected load on your chat application does not exceed the current Azure OpenAI Transactions Per Minute (TPM) quota. By simulating user behavior under heavy load, you can identify potential bottlenecks and scalability issues in your application. This process is crucial for ensuring that your chat application remains responsive and reliable, even when faced with a high volume of user requests.
Watch the demonstration video to understand more about load testing the chat app.
Prerequisites
Azure subscription. Create one for free
Access granted to Azure OpenAI in the desired Azure subscription. Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.
Dev containers are available for both samples, with all dependencies required to complete this article. You can run the dev containers in GitHub Codespaces (in a browser) or locally using Visual Studio Code.
- You only need a GitHub account
Python chat app with RAG - if you configured your chat app to use one of the load balancing solutions, this article will help you test the load balancing. The load balancing solutions includ Azure Container Apps.
Open Load test sample app
The load test is in Python chat app repository. You need to return to that dev container to complete these steps.
Run the test
Install the dependencies for the load test.
python3 -m pip install -r requirements-dev.txt
Start Locust, which uses the Locust test file: locustfile.py found at the root of the repository.
locust
Open the running Locust web site such as
http://localhost:8089
.Enter the following in the Locust web site.
Property Value Number of users 20 Ramp up 1 Host https://<YOUR-CHAT-APP-URL>.azurewebsites.net
Select Start Swarm to start the test.
Select Charts to watch the test progress.
Clean up resources
When you're done with load testing, clean up the resources. The Azure resources created in this article are billed to your Azure subscription. If you don't expect to need these resources in the future, delete them to avoid incurring more charges. After you delete resource specific to this article, remember to return to the other chat app tutorial and follow the clean up steps.
Return to the chat app article to clean up those resources.
Get help
If you have trouble using this load tester, log your issue to the repository's Issues.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for