你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

使用 GPT-3.5-Turbo、GPT-4 模型和 GPT-4o 模型

项目
09/05/2024

GPT-3.5-Turbo、GPT-4 和 GPT-4o 系列模型是针对对话接口进行了优化的语言模型。该模型的行为与旧的 GPT-3 模型不同。以前的模型是文本输入和文本输出，这意味着它们接受了提示字符串并返回了一个会追加到提示的补全。但是，最新模型都是输入对话和输出消息模式。模型需要使用特定类似聊天的脚本格式的输入。它们会返回表示聊天中模型写入的消息的补全。此格式专为多回合对话设计，但它也适用于非聊天方案。

本文将指导你开始使用聊天补全模型。若要获得最佳结果，请使用此处所述的技术。请勿尝试以与旧模型系列交互的方式来与新模型交互，因为模型通常会很冗长，并且提供不太有用的回复。

使用聊天补全模型

以下代码片段演示了与使用聊天补全 API 的模型进行交互的最基本方法。如果这是你第一次以编程方式使用这些模型，我们建议从聊天补全快速入门开始。

注意

在 Azure OpenAI 文档中，我们将 GPT-3.5-Turbo 和 GPT-35-Turbo 互换使用。 OpenAI 中该模型的官方名称是 gpt-3.5-turbo。对于 Azure OpenAI，由于 Azure 特定的字符约束，基础模型名称为 gpt-35-turbo。

OpenAI Python 1.x
OpenAI Python 0.28.1

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version = "2024-02-01",
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)

response = client.chat.completions.create(
    model="gpt-35-turbo", # model = "deployment_name".
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": "Who were the founders of Microsoft?"}
    ]
)

#print(response)
print(response.model_dump_json(indent=2))
print(response.choices[0].message.content)

{
  "id": "chatcmpl-8GHoQAJ3zN2DJYqOFiVysrMQJfe1P",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Microsoft was founded by Bill Gates and Paul Allen. They established the company on April 4, 1975. Bill Gates served as the CEO of Microsoft until 2000 and later as Chairman and Chief Software Architect until his retirement in 2008, while Paul Allen left the company in 1983 but remained on the board of directors until 2000.",
        "role": "assistant",
        "function_call": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1698892410,
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 73,
    "prompt_tokens": 29,
    "total_tokens": 102
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}
Microsoft was founded by Bill Gates and Paul Allen. They established the company on April 4, 1975. Bill Gates served as the CEO of Microsoft until 2000 and later as Chairman and Chief Software Architect until his retirement in 2008, while Paul Allen left the company in 1983 but remained on the board of directors until 2000.

注意

OpenAI Python 库版本 0.28.1 已弃用。我们建议使用 1.x。有关如何从 0.28.1 迁移到 1.x 的信息，请参阅我们的迁移指南。

import os
import openai
openai.api_type = "azure"
openai.api_version = "2024-02-01" 
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")

response = openai.ChatCompletion.create(
    engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-3.5-Turbo or GPT-4 model.
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": "Who were the founders of Microsoft?"}
    ]
)

print(response)

# To print only the response content text:
# print(response['choices'][0]['message']['content'])

输出

为便于阅读而人为添加的 JSON 格式。

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The founders of Microsoft are Bill Gates and Paul Allen. They co-founded the company in 1975.",
        "role": "assistant"
      }
    }
  ],
  "created": 1679014551,
  "id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 86,
    "prompt_tokens": 37,
    "total_tokens": 123
  }
}

注意

以下参数不适用于新的 GPT-35-Turbo 和 GPT-4 模型：logprobs、best_of 和 echo。如果你设置了这些参数中的任何一个，则会收到错误。

每个回复都包含 finish_reason。 finish_reason 可能的值为：

stop：API 返回了完整的模型输出。
length：由于 max_tokens 参数或标记限制，模型输出不完整。
content_filter：由于内容筛选器的标志，省略了内容。
null：API 回复仍在进行中或未完成。

考虑将 max_tokens 设置为略高于正常值，例如 300 或 500。更高的值可确保模型在到达消息末尾之前不会停止生成文本。

模型版本控制

注意

版本 gpt-35-turbo 等效于 OpenAI 中的 gpt-3.5-turbo 模型。

与以前的 GPT-3 和 GPT-3.5 模型不同，gpt-35-turbo 模型以及 gpt-4 和 gpt-4-32k 模型将继续更新。创建这些模型的部署时，还需要指定模型版本。

可在模型页上找到这些模型的模型停用日期。

使用聊天补全 API

OpenAI 训练了 GPT-35-Turbo 和 GPT-4 模型，以接受对话形式的输入。消息参数获取消息对象数组，对话按角色整理。当你使用 Python API 时，会用到一个字典列表。

基本的聊天补全的格式为：

{"role": "system", "content": "Provide some context and/or instructions to the model"},
{"role": "user", "content": "The users messages goes here"}

一个示例答案后跟一个问题的对话如下所示：

{"role": "system", "content": "Provide some context and/or instructions to the model."},
{"role": "user", "content": "Example question goes here."},
{"role": "assistant", "content": "Example answer goes here."},
{"role": "user", "content": "First question/message for the model to actually respond to."}

系统角色

数组的开头包含系统角色（也称为系统消息）。此消息提供模型的初始说明。可以在系统角色中提供各种信息，例如：

助手的简要说明。
助手的个性特征。
你希望助手遵循的指令或规则。
模型所需的数据或信息，例如 FAQ 中的相关问题。

你可以为用例自定义系统角色，也可以包含基本说明。系统角色/消息是可选的，但建议至少包含一条基本说明，以获得最佳结果。

消息

在系统角色之后，可以在 user 和 assistant 之间加入一系列消息。

 {"role": "user", "content": "What is thermodynamics?"}

若要触发模型回复，请以用户消息结尾以指示该轮到助手回复了。还可以在用户和助手之间加入一系列样本消息，以此进行少样本学习。

消息提示示例

以下部分展示了可用于 GPT-35-Turbo 和 GPT-4 模型的不同提示样式的示例。这些示例只是一个起点。你可以尝试不同的提示来自定义自己的用例的行为。

基本示例

如果希望 GPT-35-Turbo 模型的行为与 chat.openai.com 类似，则可以使用类似于 Assistant is a large language model trained by OpenAI. 的基本系统消息

{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
{"role": "user", "content": "Who were the founders of Microsoft?"}

说明的示例

对于某些场景，你可能希望向模型提供更多说明，以定义模型能够执行的操作的边界。

{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions.
Instructions: 
- Only answer questions related to taxes. 
- If you're unsure of an answer, you can say "I don't know" or "I'm not sure" and recommend users go to the IRS website for more information. "},
{"role": "user", "content": "When are my taxes due?"}

使用数据作为基础

你还可以在系统消息中加入相关数据或信息，为模型提供额外的对话上下文。如果只需要包含少量信息，则可以将其硬编码到系统消息中。如果你有模型应注意的大量数据，则可以使用嵌入或 Azure AI 搜索等产品在查询时检索最相关的信息。

{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you're not sure of an answer, you can say 'I don't know'.

Context:
- Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series.
- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
- At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use."
},
{"role": "user", "content": "What is Azure OpenAI Service?"}

使用聊天补全进行少样本学习

还可以为模型提供少样本学习的示例。由于新的提示格式，少样本学习的方法略有变化。现在可以在提示中加入用户和助理之间的一系列消息，作为少样本学习示例。通过使用这些例子，你可以为常见问题设定回答，以启动模型或教它特定的行为。

此示例演示如何对 GPT-35-Turbo 和 GPT-4 使用少样本学习。可以尝试不同的方法，看看哪种方法最适合你的用例。

{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions. "},
{"role": "user", "content": "When do I need to file my taxes by?"},
{"role": "assistant", "content": "In 2023, you will need to file your taxes by April 18th. The date falls after the usual April 15th deadline because April 15th falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file."},
{"role": "user", "content": "How can I check the status of my tax refund?"},
{"role": "assistant", "content": "You can check the status of your tax refund by visiting https://www.irs.gov/refunds"}

对非聊天方案使用聊天补全

聊天补全 API 设计用于多回合对话，但也适用于非聊天场景。

例如，对于实体提取场景，可以使用以下提示：

{"role": "system", "content": "You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a JSON object. Here's an example of your output format:
{
   "name": "",
   "company": "",
   "phone_number": ""
}"},
{"role": "user", "content": "Hello. My name is Robert Smith. I'm calling from Contoso Insurance, Delaware. My colleague mentioned that you are interested in learning about our comprehensive benefits policy. Could you give me a call back at (555) 346-9322 when you get a chance so we can go over the benefits?"}

创建基本聊天循环

到目前为止，这些示例演示了与聊天补全 API 交互的基本机制。此示例演示如何创建对话循环以执行以下操作：

持续接受控制台输入，并将其正确格式化为消息列表的一部分作为用户角色内容。
输出回复，这些回复在控制台中显示，经过格式化并添加到消息列表中作为助手角色内容。

每次提出新问题时，都会将到截至目前正在运行的对话脚本连同最新问题一起发送。由于模型没有内存，因此需要发送包含每个新问题的更新脚本，否则模型将丢失上述问题和答案的上下文。

OpenAI Python 1.x
OpenAI Python 0.28.1

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version = "2024-02-01",
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
)

conversation=[{"role": "system", "content": "You are a helpful assistant."}]

while True:
    user_input = input("Q:")      
    conversation.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-35-turbo", # model = "deployment_name".
        messages=conversation
    )

    conversation.append({"role": "assistant", "content": response.choices[0].message.content})
    print("\n" + response.choices[0].message.content + "\n")

import os
import openai
openai.api_type = "azure"
openai.api_version = "2024-02-01" 
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")

conversation=[{"role": "system", "content": "You are a helpful assistant."}]

while True:
    user_input = input()      
    conversation.append({"role": "user", "content": user_input})

    response = openai.ChatCompletion.create(
        engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-35-turbo or GPT-4 model.
        messages=conversation
    )

    conversation.append({"role": "assistant", "content": response["choices"][0]["message"]["content"]})
    print("\n" + response['choices'][0]['message']['content'] + "\n")

运行上述代码时，你将获得一个空白控制台窗口。在窗口中输入你的第一个问题，然后选择 Enter 键。回复返回后，可以重复此过程并不断提出问题。

管理对话

前面的示例将一直运行，直到达到模型的标记限制。在提出每个问题和收到答案后，messages 列表的大小都会增加。 gpt-35-turbo 的标记限制为 4,096 个标记。 gpt-4 和 gpt-4-32k 的标记限制分别为 8,192 和 32,768。这些限制包括发送的消息列表和模型回复中的标记计数。与 max_tokens 参数的值组合在一起的消息列表中的标记数必须保持在这些限制以下，否则会收到错误。

你有责任确保提示和补全操作在标记限制范围内。对于较长的对话，需要跟踪标记数，并仅向模型发送在限制以内的提示。

注意

强烈建议你将所有模型都控制在记录的输入标记限制内，即便你发现可以超过该限制。

以下代码示例演示了一个简单的聊天循环示例，以及使用 OpenAI 的 tiktoken 库处理 4,096 标记计数的技术。

该代码使用 tiktoken 0.5.1。如果有旧版本，请运行 pip install tiktoken --upgrade。

OpenAI Python 1.x
OpenAI Python 0.28.1

import tiktoken
import os
from openai import AzureOpenAI

client = AzureOpenAI(
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version = "2024-02-01",
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
)

system_message = {"role": "system", "content": "You are a helpful assistant."}
max_response_tokens = 250
token_limit = 4096
conversation = []
conversation.append(system_message)

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens
while True:
    user_input = input("Q:")      
    conversation.append({"role": "user", "content": user_input})
    conv_history_tokens = num_tokens_from_messages(conversation)

    while conv_history_tokens + max_response_tokens >= token_limit:
        del conversation[1] 
        conv_history_tokens = num_tokens_from_messages(conversation)

    response = client.chat.completions.create(
        model="gpt-35-turbo", # model = "deployment_name".
        messages=conversation,
        temperature=0.7,
        max_tokens=max_response_tokens
    )


    conversation.append({"role": "assistant", "content": response.choices[0].message.content})
    print("\n" + response.choices[0].message.content + "\n")

import tiktoken
import openai
import os

openai.api_type = "azure"
openai.api_version = "2024-02-01" 
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")

system_message = {"role": "system", "content": "You are a helpful assistant."}
max_response_tokens = 250
token_limit = 4096
conversation = []
conversation.append(system_message)

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

while True:
    user_input = input("")     
    conversation.append({"role": "user", "content": user_input})
    conv_history_tokens = num_tokens_from_messages(conversation)

    while conv_history_tokens + max_response_tokens >= token_limit:
        del conversation[1] 
        conv_history_tokens = num_tokens_from_messages(conversation)

    response = openai.ChatCompletion.create(
        engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-35-Turbo or GPT-4 model.
        messages=conversation,
        temperature=0.7,
        max_tokens=max_response_tokens,
    )

    conversation.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
    print("\n" + response['choices'][0]['message']['content'] + "\n")

在此示例中，达到标记计数后，将移除对话脚本中最早的消息。为了提高效率，使用 del 而不是 pop()。我们从索引 1 开始，以始终保留系统消息，并仅移除用户或助理消息。随着时间的推移，这种管理对话的方法可能会导致对话质量下降，因为模型会逐渐失去对话早期部分的上下文。

另一种方法是将对话持续时间限制为标记长度上限或特定的回合数。达到标记上限后，如果允许对话继续，则模型将丢失上下文。可以提示用户开始新的对话，并清除消息列表，以启动具有完整标记上限的新对话。

前面演示的代码的标记计数部分是其中一个 OpenAI 指南示例的简化版本。

故障排除

下面是故障排除提示。

请勿将 ChatML 语法或特殊令牌用于聊天补全终结点

一些客户尝试将旧 ChatML 语法与聊天补全终结点和较新的模型配合使用。 ChatML 是一项预览功能，仅适用于旧补全终结点和 gpt-35-turbo 版本 0301 模型。此模型已计划停用。如果尝试将 ChatML 语法用于较新的模型和聊天补全终结点，则可能会导致错误和意外的模型响应行为。我们不建议这样使用。使用常见的特殊令牌时，可能会出现相同的问题。

错误代码	错误消息	解决方案
400	400 -“由于输入中的特殊标记而无法生成输出。”	提示包含模型/终结点无法识别或支持的特殊令牌或旧 ChatML 令牌。确保提示/消息数组不包含任何旧的 ChatML 令牌/特殊令牌。如果要从旧模型升级，请在向模型提交 API 请求之前排除所有特殊标记。

由于模型生成无效的 Unicode 输出而无法创建补全

错误代码	错误消息	解决方法
500	500 - InternalServerError：错误代码：500 - {'error': {'message': '由于模型生成无效的 Unicode 输出而无法创建补全}}。	可以通过将提示温度降低到 1 以下并确保使用具有重试逻辑的客户端来最大程度地减少这些错误的发生。重新尝试请求通常会得到成功的响应。

后续步骤

详细了解 Azure OpenAI。
通过 GPT-35-Turbo 快速入门开始使用 GPT-35-Turbo 模型。
如需更多示例，请参阅 Azure OpenAI 示例 GitHub 存储库。

通过