Frequently asked questions about using Azure AI services for startups.
Getting started
What is the best way to get started using Azure OpenAI Service for my startup?
Check out the Generative AI for beginners course on GitHub. It's an 18-lesson instruction set that introduces all of the main Azure OpenAI features and shows you how to build applications with them.
How can I test out Azure AI capabilities quickly with a low/no-code approach?
Use Azure AI Studio to test a variety of AI capabilities, including deploying Azure OpenAI models and applying content moderation services.
Regional availability and data residency
In which Azure regions is the OpenAI service available?
Different Azure OpenAI models are restricted to different regions. See the model availability table for a complete list.
How does region selection impact the latency and performance of Azure OpenAI services?
The impact is minimal, unless you're using the streaming feature. The latency of the model's own response has a much greater effect on latency than region differences.
The choice of using a dedicated Azure OpenAI server vs. pay-as-you-go plan also has a larger impact on performance.
Rate limits and resource management
How can I ensure my application can scale its Azure OpenAI quota?
See Manage Azure OpenAI Service quota to understand how quota limits work and how to manage them.
What are the rate limits for Azure OpenAI Service and how can I manage them?
For customers using the pay-as-you-go model (most common), see the Manage Azure OpenAI Service quota page. For customers using a dedicated Azure OpenAI server, see the quota section of the related guide.
How do I handle token-per-minute restrictions in Azure OpenAI Service?
Consider combining multiple Azure OpenAI deployments in an advanced architecture to build a system that delivers more tokens-per-minute to more users.
When should I use a dedicated Azure OpenAI server (PTU) instead of the pay-as-you-go model?
You should consider switching from pay-as-you-go to provisioned throughput when you have well defined, predictable throughput requirements. Typically, this is the case when the application is ready for production or has already been deployed in production and there is an understanding of the expected traffic. This allows users to accurately forecast the required capacity and avoid unexpected billing.
Load balancing and scaling
How do I manage high traffic and ensure my Azure OpenAI application remains responsive?
Create a load balancer for your application.
See the Load balancing sample if you're using the pay-as-you-go-model. If you're using a dedicated Azure OpenAI server, see the PTU guide for information on load balancing.
Development and testing
How do I set up a development environment to test Azure OpenAI applications?
Create an online deployment using prompt flow in Azure AI Studio. Then, test it out by inputting values in the form editor or JSON editor.
Monitoring and metrics
How can I track and evaluate usage metrics of my AI application?
See the Evaluation and monitoring metrics guide for information on tracking risk and safety metrics as well as a number of response quality metrics.
What tools can I use to monitor the performance of my Azure OpenAI endpoints?
Use the monitoring feature of Azure OpenAI Studio. It provides a dashboards that track the performance metrics of your models over time.
Production implementation and best practices
What are some best practices for deploying OpenAI applications on Azure to production?
See the Azure OpenAI chat reference architecture for best practices for deploying a standard chat application.
Can you provide examples or case studies of successful implementations of Azure OpenAI Service?
See the Artificial Intelligence and Machine Learning tech community forum.
Related content
To learn more, see Microsoft for Startups.