AI 에이전트 만들기 및 로그

아티클
07/27/2024

Important

이 기능은 공개 미리 보기 상태입니다.

이 문서에서는 Mosaic AI 에이전트 프레임워크를 사용하여 RAG 애플리케이션과 같은 AI 에이전트를 만들고 기록하는 방법을 보여 줍니다.

체인 및 에이전트란?

AI 시스템에는 종종 많은 구성 요소가 있습니다. 예를 들어 AI 시스템은 벡터 인덱스에서 문서를 검색하고, 해당 문서를 사용하여 프롬프트 텍스트를 보완하고, 기본 모델을 사용하여 응답을 요약할 수 있습니다. 단계라고도 하는 이러한 구성 요소를 연결하는 코드를 체인이라고 합니다.

에이전트는 대규모 언어 모델을 사용하여 입력을 기반으로 수행할 단계를 결정하는 훨씬 더 고급 AI 시스템입니다. 반면 체인은 특정 결과를 얻기 위한 단계의 하드 코딩된 시퀀스입니다.

에이전트 프레임워크를 사용하면 모든 라이브러리 또는 패키지를 사용하여 코드를 만들 수 있습니다. 또한 에이전트 프레임워크를 사용하면 코드를 개발하고 테스트할 때 코드를 쉽게 반복할 수 있습니다. 실제 코드를 수정하지 않고도 추적 가능한 방식으로 코드 매개 변수를 변경할 수 있는 구성 파일을 설정할 수 있습니다.

요구 사항

Databricks 관리형 벡터 검색 인덱스를 사용하는 에이전트의 경우 벡터 인덱 mlflow 스와 함께 자동 권한 부여를 사용하려면 버전 2.13.1 이상이 필요합니다.

RAG 에이전트에 대한 입력 스키마

다음은 체인에 대해 지원되는 입력 형식입니다.

(권장) OpenAI 채팅 완료 스키마를 사용하는 쿼리입니다. 개체의 배열을 매개 변수로 messages 포함해야 합니다. 이 형식은 RAG 애플리케이션에 가장 적합합니다.

question = {
    "messages": [
        {
            "role": "user",
            "content": "What is Retrieval-Augmented Generation?",
        },
        {
            "role": "assistant",
            "content": "RAG, or Retrieval Augmented Generation, is a generative AI design pattern that combines a large language model (LLM) with external knowledge retrieval. This approach allows for real-time data connection to generative AI applications, improving their accuracy and quality by providing context from your data to the LLM during inference. Databricks offers integrated tools that support various RAG scenarios, such as unstructured data, structured data, tools & function calling, and agents.",
        },
        {
            "role": "user",
            "content": "How to build RAG for unstructured data",
        },
    ]
}

SplitChatMessagesRequest. 특히 현재 쿼리와 기록을 별도로 관리하려는 경우 멀티 턴 채팅 애플리케이션에 권장됩니다.

question = {
    "query": "What is MLflow",
    "history": [
        {
            "role": "user",
            "content": "What is Retrieval-augmented Generation?"
        },
        {
            "role": "assistant",
            "content": "RAG is"
        }
    ]
}

LangChain의 경우 Databricks는 LangChain 식 언어로 체인을 작성하는 것이 좋습니다. 체인 정의 코드에서 사용 중인 입력 형식에 따라 메시지 또는 query history 개체를 가져오는 데 사용할 itemgetter 수 있습니다.

RAG 에이전트에 대한 출력 스키마

코드는 다음 지원되는 출력 형식 중 하나를 준수해야 합니다.

(권장) ChatCompletionResponse. 이 형식은 OpenAI 응답 형식 상호 운용성이 있는 고객에게 권장됩니다.
StringObjectResponse. 이 형식은 해석하기 가장 쉽고 간단합니다.

LangChain의 경우 최종 체인 단계로 MLflow를 사용하거나 ChatCompletionsOutputParser() MLflow에서 사용합니다StringResponseOutputParser(). 이렇게 하면 LangChain AI 메시지의 형식이 에이전트 호환 형식으로 지정됩니다.


  from mlflow.langchain.output_parsers import StringResponseOutputParser, ChatCompletionsOutputParser

  chain = (
      {
          "user_query": itemgetter("messages")
          | RunnableLambda(extract_user_query_string),
          "chat_history": itemgetter("messages") | RunnableLambda(extract_chat_history),
      }
      | RunnableLambda(fake_model)
      | StringResponseOutputParser() # use this for StringObjectResponse
      # ChatCompletionsOutputParser() # or use this for ChatCompletionResponse
  )

PyFunc를 사용하는 경우 Databricks는 형식 힌트를 사용하여 정의된 클래스의 하위 클래스인 입력 및 출력 데이터 클래스로 함수에 mlflow.models.rag_signatures주석 predict() 을 추가하는 것이 좋습니다.

형식을 따르도록 내부 predict() 데이터 클래스에서 출력 개체를 생성할 수 있습니다. 반환된 개체를 serialize할 수 있도록 사전 표현으로 변환해야 합니다.


  from mlflow.models.rag_signatures import ChatCompletionRequest, ChatCompletionResponse, ChainCompletionChoice, Message

  class RAGModel(PythonModel):
    ...
      def predict(self, context, model_input: ChatCompletionRequest) -> ChatCompletionResponse:
        ...
        return asdict(ChatCompletionResponse(
            choices=[ChainCompletionChoice(message=Message(content=text))]
        ))

매개 변수를 사용하여 품질 반복 제어

에이전트 프레임워크에서 매개 변수를 사용하여 에이전트 실행 방법을 제어할 수 있습니다. 이렇게 하면 코드를 변경하지 않고도 에이전트의 다양한 특성을 빠르게 반복할 수 있습니다. 매개 변수는 Python 사전 또는 파일에서 정의하는 키-값 쌍입니다 .yaml .

코드를 구성하려면 키-값 매개 변수 집합을 만듭니 ModelConfig다. Python ModelConfig 사전 또는 파일입니다 .yaml . 예를 들어 개발 중에 사전을 사용한 다음 프로덕션 배포 및 CI/CD용 파일로 .yaml 변환할 수 있습니다. 자세한 ModelConfig내용은 MLflow 설명서를 참조 하세요.

예제 ModelConfig 는 다음과 같습니다.

llm_parameters:
  max_tokens: 500
  temperature: 0.01
model_serving_endpoint: databricks-dbrx-instruct
vector_search_index: ml.docs.databricks_docs_index
prompt_template: 'You are a hello world bot. Respond with a reply to the user''s
  question that indicates your prompt template came from a YAML file. Your response
  must use the word "YAML" somewhere. User''s question: {question}'
prompt_template_input_vars:
- question

코드에서 구성을 호출하려면 다음 중 하나를 사용합니다.

# Example for loading from a .yml file
config_file = "configs/hello_world_config.yml"
model_config = mlflow.models.ModelConfig(development_config=config_file)

# Example of using a dictionary
config_dict = {
    "prompt_template": "You are a hello world bot. Respond with a reply to the user's question that is fun and interesting to the user. User's question: {question}",
    "prompt_template_input_vars": ["question"],
    "model_serving_endpoint": "databricks-dbrx-instruct",
    "llm_parameters": {"temperature": 0.01, "max_tokens": 500},
}

model_config = mlflow.models.ModelConfig(development_config=config_dict)

# Use model_config.get() to retrieve a parameter value
value = model_config.get('sample_param')

에이전트 로그

에이전트 로깅은 개발 프로세스의 기초입니다. 로깅은 에이전트 코드 및 구성의 "지정 시간"을 캡처하므로 구성의 품질을 평가할 수 있습니다. 에이전트를 개발할 때 Databricks는 serialization 기반 로깅 대신 코드 기반 로깅을 사용하는 것이 좋습니다. 각 로깅 유형의 장단점에 대한 자세한 내용은 코드 기반 및 serialization 기반 로깅을 참조 하세요.

이 섹션에서는 코드 기반 로깅을 사용하는 방법을 설명합니다.

코드 기반 로깅 워크플로

코드 기반 로깅의 경우 에이전트 또는 체인을 기록하는 코드는 체인 코드와 별도의 Notebook에 있어야 합니다. 이 전자 필기장을 드라이버 Notebook이라고 합니다. 예제 Notebook은 예제 Notebook을 참조 하세요.

LangChain을 사용하는 코드 기반 로깅 워크플로

코드를 사용하여 Notebook 또는 Python 파일을 만듭니다. 이 예제에서는 Notebook 또는 파일의 이름을 지정 chain.py합니다. Notebook 또는 파일에는 여기 lc_chain라고 하는 LangChain 체인이 포함되어야 합니다.
전자 필기장 또는 파일에 포함합니다 mlflow.models.set_model(lc_chain) .
드라이버 Notebook으로 사용할 새 Notebook을 만듭니다(이 예제에서는 호출 driver.py 됨).
드라이버 전자 필기장에서 호출 mlflow.lang_chain.log_model(lc_model=”/path/to/chain.py”)을 포함합니다. 이 호출은 실행 chain.py 되고 결과를 MLflow 모델에 기록합니다.
모델을 배포합니다.
서비스 환경이 로드되면 chain.py 실행됩니다.
서비스 요청이 들어오면 lc_chain.invoke(...) 호출됩니다.

PyFunc를 사용하는 코드 기반 로깅 워크플로

코드를 사용하여 Notebook 또는 Python 파일을 만듭니다. 이 예제에서는 Notebook 또는 파일의 이름을 지정 chain.py합니다. Notebook 또는 파일에는 여기에서 PyFuncClass참조하는 PyFunc 클래스가 포함되어야 합니다.
전자 필기장 또는 파일에 포함합니다 mlflow.models.set_model(PyFuncClass) .
드라이버 Notebook으로 사용할 새 Notebook을 만듭니다(이 예제에서는 호출 driver.py 됨).
드라이버 전자 필기장에서 호출 mlflow.pyfunc.log_model(python_model=”/path/to/chain.py”, resources=”/path/to/resources.yaml”)을 포함합니다. 이 호출은 실행 chain.py 되고 결과를 MLflow 모델에 기록합니다. 매개 변수를 resources 사용하여 기본 모델을 제공하는 벡터 검색 인덱스 또는 서비스 엔드포인트와 같은 모델을 제공하는 데 필요한 리소스를 선언합니다. PyFunc에 대한 예제 리소스 파일을 참조하세요.
모델을 배포합니다.
서비스 환경이 로드되면 chain.py 실행됩니다.
서비스 요청이 들어오면 PyFuncClass.predict(...) 호출됩니다.

로깅 체인에 대한 예제 코드

import mlflow

code_path = "/Workspace/Users/first.last/chain.py"
config_path = "/Workspace/Users/first.last/config.yml"

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What is Retrieval-augmented Generation?",
        }
    ]
}

# example using LangChain
with mlflow.start_run():
  logged_chain_info = mlflow.langchain.log_model(
    lc_model=code_path,
    model_config=config_path, # If you specify this parameter, this is the configuration that is used for training the model. The development_config is overwritten.
    artifact_path="chain", # This string is used as the path inside the MLflow model where artifacts are stored
    input_example=input_example, # Must be a valid input to your chain
    example_no_conversion=True, # Required
  )

# or use a PyFunc model

# resources_path = "/Workspace/Users/first.last/resources.yml"

# with mlflow.start_run():
#   logged_chain_info = mlflow.pyfunc.log_model(
#     python_model=chain_notebook_path,
#     artifact_path="chain",
#     input_example=input_example,
#     resources=resources_path,
#     example_no_conversion=True,
#   )

print(f"MLflow Run: {logged_chain_info.run_id}")
print(f"Model URI: {logged_chain_info.model_uri}")

모델이 올바르게 기록되었는지 확인하려면 체인을 로드하고 다음을 호출 invoke합니다.

# Using LangChain
model = mlflow.langchain.load_model(logged_chain_info.model_uri)
model.invoke(example)

# Using PyFunc
model = mlflow.pyfunc.load_model(logged_chain_info.model_uri)
model.invoke(example)

PyFunc에 대한 리소스 파일 예제

모델을 제공하는 데 필요한 벡터 검색 인덱스 및 서비스 엔드포인트와 같은 리소스를 선언할 수 있습니다. LangChain의 경우 리소스가 자동으로 선택되고 모델과 함께 기록됩니다.

api_version: "1"
databricks:
    vector_search_index:
      - name: "catalog.schema.my_vs_index"
    serving_endpoint:
    - name: databricks-dbrx-instruct

Unity 카탈로그에 체인 등록

체인을 배포하기 전에 Unity 카탈로그에 체인을 등록해야 합니다. 체인을 등록하면 이 체인은 Unity 카탈로그에 모델로 패키지되며, 체인의 리소스에 대한 권한 부여에 Unity 카탈로그 권한을 사용할 수 있습니다.

import mlflow

mlflow.set_registry_uri("databricks-uc")

catalog_name = "test_catalog"
schema_name = "schema"
model_name = "chain_name"

model_name = catalog_name + "." + schema_name + "." + model_name
uc_model_info = mlflow.register_model(model_uri=logged_chain_info.model_uri, name=model_name)

예제 Notebook

이러한 Notebook은 Databricks에서 체인 애플리케이션을 만드는 방법을 설명하는 간단한 "Hello, world" 체인을 만듭니다. 첫 번째 예제에서는 간단한 체인을 만듭니다. 두 번째 예제 Notebook에서는 매개 변수를 사용하여 개발 중에 코드 변경을 최소화하는 방법을 보여 줍니다.

코드 기반 및 serialization 기반 로깅

체인을 만들고 로그하려면 코드 기반 MLflow 로깅 또는 serialization 기반 MLflow 로깅을 사용할 수 있습니다. Databricks는 코드 기반 로깅을 사용하는 것이 좋습니다.

코드 기반 MLflow 로깅을 사용하면 체인의 코드가 Python 파일로 캡처됩니다. Python 환경은 패키지 목록으로 캡처됩니다. 체인이 배포되면 Python 환경이 복원되고 체인의 코드가 실행되어 엔드포인트가 호출될 때 호출될 수 있도록 체인을 메모리에 로드합니다.

Serialization 기반 MLflow 로깅을 사용하면 Python 환경의 체인 코드와 현재 상태가 디스크로 직렬화되며, 종종 라이브러리(예: pickle 또는 joblib)를 사용합니다. 체인이 배포되면 Python 환경이 복원되고 직렬화된 개체가 메모리에 로드되므로 엔드포인트가 호출될 때 호출할 수 있습니다.

표에서는 각 메서드의 장점과 단점을 보여 줍니다.

메서드	장점	단점
코드 기반 MLflow 로깅	* 많은 인기 GenAI 라이브러리에서 지원하지 않는 직렬화의 내재된 제한 사항을 극복합니다. * 나중에 참조할 수 있는 원래 코드의 복사본을 저장합니다. * 직렬화할 수 있는 단일 개체로 코드를 재구성할 필요가 없습니다.	`log_model(...)`는 체인의 코드(드라이버 Notebook이라고 함)와 다른 Notebook에서 호출되어야 합니다.
Serialization 기반 MLflow 로깅	`log_model(...)` 는 모델이 정의된 동일한 Notebook에서 호출할 수 있습니다.	* 원래 코드를 사용할 수 없습니다. * 체인에 사용되는 모든 라이브러리 및 개체는 serialization을 지원해야 합니다.

다음을 통해 공유