First api call is slower than subsequent calls

Nguyen Huy Tuan 20 Reputation points
2024-10-28T07:59:54.3933333+00:00

Here is my code in python

import time
import asyncio
from openai import AzureOpenAI

# Khởi tạo các client với các endpoint và API key khác nhau cho từng dịch vụ
client_text = AzureOpenAI(
  azure_endpoint = "",
  api_key="",
  api_version="2024-08-01-preview"
)

client_embedding = AzureOpenAI(
    azure_endpoint="",  # Thay thế bằng endpoint của embedding
  api_key="",
  api_version="2024-08-01-preview"
)

# Hàm đo thời gian đồng bộ cho text completion
def sync_text_completion(model: str, prompt: str):
    start_time = time.time()
    completion = client_text.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "helpful assistant"},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = completion.choices[0].message.content
    end_time = time.time()
    print(response_text)
    print(f"Sync Text Completion Time: {end_time - start_time:.2f} seconds\n")


# Hàm đo thời gian đồng bộ cho embedding
def sync_embedding(model: str, input_text: str):
    start_time = time.time()
    embedding = client_embedding.embeddings.create(
        model=model,
        input=input_text
    )
    embedding_vector = embedding.data[0].embedding
    end_time = time.time()
    print(f"Embedding Vector (first 10 elements): {embedding_vector[:10]}...")
    print(f"Sync Embedding Time: {end_time - start_time:.2f} seconds\n")




# Chạy tất cả các hàm đồng bộ
text_model_name = "gpt-4o-mini"  # Tên deployment của mô hình text
embedding_model_name = "Etext-embedding-3-small"  # Tên deployment của mô hình embedding


sync_text_completion(text_model_name, "my name is quang")
sync_embedding(embedding_model_name, "tool learning for llm")



# Đồng bộ
sync_text_completion(text_model_name, "hi bro")
sync_embedding(embedding_model_name, "Example text for embedding 123")

sync_text_completion(text_model_name, "xin chao")
sync_embedding(embedding_model_name, "tran quoc viet quang")

Here is my result:

Nice to meet you, Quang! How can I assist you today?
Sync Text Completion Time: 1.71 seconds

Embedding Vector (first 10 elements): [-0.016174498945474625, 0.02521318942308426, 0.03261658921837807, -0.05655127763748169, 0.017036741599440575, -0.009373181499540806, -0.010718578472733498, 0.036927808076143265, -0.01228696946054697, -0.000676879717502743]...
Sync Embedding Time: 1.08 seconds

Hey! How’s it going? What can I help you with today?
Sync Text Completion Time: 0.54 seconds

Embedding Vector (first 10 elements): [0.01610065996646881, -0.01671152561903, 0.034237537533044815, -0.0027888903860002756, 0.0018125969218090177, -0.04758930578827858, 0.04092796519398689, -0.0022580192890018225, 0.00559596111997962, -0.01332267839461565]...
Sync Embedding Time: 0.29 seconds

Xin chào! Bạn cần giúp đỡ gì hôm nay?
Sync Text Completion Time: 0.60 seconds

Embedding Vector (first 10 elements): [-0.02606993541121483, 0.037195321172475815, -0.02804976888000965, 0.005387063138186932, 0.028203045949339867, -0.0021682369988411665, 0.004837818909436464, 0.048614490777254105, -0.082156702876091, -0.048307936638593674]...
Sync Embedding Time: 0.28 seconds
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,239 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 9,405 Reputation points Microsoft Vendor
    2024-10-28T14:56:22.0566667+00:00

    Hi @Nguyen Huy Tuan,Thank you for reaching out to Microsoft Q&A forum!

    The slower response time for the first Azure API call compared to subsequent calls is often due to a "cold start" phenomenon. Cold starts are an important consideration when using Azure OpenAI, particularly in dynamic scaling scenarios. A cold start occurs when a model is invoked for the first time or after a period of inactivity, resulting in increased latency as the necessary resources are initialized and the model is loaded.

    To mitigate cold starts, you can implement by consider scheduling periodic warm-up requests to keep the model initialized, thereby reducing latency for subsequent calls. However, Microsoft Azure is continuously working to resolve cold start issues and improve performance, ensuring that users have a more seamless experience over time.

    I hope you understand. And, if you have any further query do let us know.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.