체인에서 여러 ML 모델 실행

아티클
2023. 07. 11.

Windows ML은 GPU 경로를 신중하게 최적화하여 모델 체인의 고성능 로드 및 실행을 지원합니다. 모델 체인은 순차적으로 실행되는 둘 이상의 모델로 정의되며, 체인에서 한 모델의 출력은 다음 모델의 입력이 됩니다.

Windows ML과 모델을 효율적으로 연결하는 방법을 설명하기 위해, FNS-Candy Style Transfer ONNX 모델을 예제로 사용해 보겠습니다. 이 유형의 모델은 GitHub의 FNS-Candy Style Transfer 샘플 폴더에서 찾을 수 있습니다.

mosaic.onnx라는 동일한 FNS-Candy 모델의 두 인스턴스로 구성된 체인을 실행한다고 가정해 보겠습니다. 애플리케이션 코드는 이미지를 체인의 첫 번째 모델로 전달하고 출력이 계산된 후에는 변환된 이미지를 FNS-Candy의 다른 인스턴스로 전달하여 최종 이미지를 생성합니다.

다음 단계는 Windows ML을 사용하여 이 작업을 수행하는 방법을 보여줍니다.

참고

실제 시나리오에서는 서로 다른 두 가지 모델을 사용할 가능성이 높지만 여기 내용은 개념을 설명하기에 충분합니다.

먼저, mosaic.onnx 모델을 사용할 수 있도록 로드하겠습니다.

std::wstring filePath = L"path\\to\\mosaic.onnx"; 
LearningModel model = LearningModel::LoadFromFilePath(filePath);

string filePath = "path\\to\\mosaic.onnx";
LearningModel model = LearningModel.LoadFromFilePath(filePath);

그런 다음, 입력 매개 변수와 동일한 모델을 사용하여 디바이스의 기본 GPU에서 동일한 두 개의 세션을 만들겠습니다.

LearningModelSession session1(model, LearningModelDevice(LearningModelDeviceKind::DirectX));
LearningModelSession session2(model, LearningModelDevice(LearningModelDeviceKind::DirectX));

LearningModelSession session1 = 
  new LearningModelSession(model, new LearningModelDevice(LearningModelDeviceKind.DirectX));
LearningModelSession session2 = 
  new LearningModelSession(model, new LearningModelDevice(LearningModelDeviceKind.DirectX));

참고

체인의 성능 이점을 활용하려면 모든 모델에 대해 동일한 GPU 세션을 만들어야 합니다. 이렇게 하지 않으면 GPU에서 CPU로 데이터 이동이 추가로 발생하기 때문에 성능이 저하됩니다.

다음 코드 줄은 각 세션에 대한 바인딩을 만듭니다.

LearningModelBinding binding1(session1);
LearningModelBinding binding2(session2);

LearningModelBinding binding1 = new LearningModelBinding(session1);
LearningModelBinding binding2 = new LearningModelBinding(session2);

다음에는 첫 번째 모델에 대한 입력을 바인딩합니다. 모델과 동일한 경로에 있는 이미지를 전달합니다. 이 예제에서는 이미지를 "fish_720.png"라고 합니다.

//get the input descriptor
ILearningModelFeatureDescriptor input = model.InputFeatures().GetAt(0);
//load a SoftwareBitmap
hstring imagePath = L"path\\to\\fish_720.png";

// Get the image and bind it to the model's input
try
{
  StorageFile file = StorageFile::GetFileFromPathAsync(imagePath).get();
  IRandomAccessStream stream = file.OpenAsync(FileAccessMode::Read).get();
  BitmapDecoder decoder = BitmapDecoder::CreateAsync(stream).get();
  SoftwareBitmap softwareBitmap = decoder.GetSoftwareBitmapAsync().get();
  VideoFrame videoFrame = VideoFrame::CreateWithSoftwareBitmap(softwareBitmap);
  ImageFeatureValue image = ImageFeatureValue::CreateFromVideoFrame(videoFrame);
  binding1.Bind(input.Name(), image);
}
catch (...)
{
  printf("Failed to load/bind image\n");
}

//get the input descriptor
ILearningModelFeatureDescriptor input = model.InputFeatures[0];
//load a SoftwareBitmap
string imagePath = "path\\to\\fish_720.png";

// Get the image and bind it to the model's input
try
{
    StorageFile file = await StorageFile.GetFileFromPathAsync(imagePath);
    IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);
    BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);
    SoftwareBitmap softwareBitmap = await decoder.GetSoftwareBitmapAsync();
    VideoFrame videoFrame = VideoFrame.CreateWithSoftwareBitmap(softwareBitmap);
    ImageFeatureValue image = ImageFeatureValue.CreateFromVideoFrame(videoFrame);
    binding1.Bind(input.Name, image);
}
catch
{
    Console.WriteLine("Failed to load/bind image");
}

체인의 다음 모델이 첫 번째 모델의 평가 출력을 사용하려면, 빈 출력 텐서를 생성하고 출력을 바인딩해야 체인으로 연결할 마커를 갖게 됩니다.

//get the output descriptor
ILearningModelFeatureDescriptor output = model.OutputFeatures().GetAt(0);
//create an empty output tensor 
std::vector<int64_t> shape = {1, 3, 720, 720};
TensorFloat outputValue = TensorFloat::Create(shape); 
//bind the (empty) output
binding1.Bind(output.Name(), outputValue);

//get the output descriptor
ILearningModelFeatureDescriptor output = model.OutputFeatures[0];
//create an empty output tensor 
List<long> shape = new List<long> { 1, 3, 720, 720 };
TensorFloat outputValue = TensorFloat.Create(shape);
//bind the (empty) output
binding1.Bind(output.Name, outputValue);

참고

출력을 바인딩할 때는 TensorFloat 데이터 유형을 사용해야 합니다. 그래야 첫 번째 모델에 대한 평가가 완료된 후 역텐서화(de-tensorization)가 발생하지 않으며, 따라서 두 번째 모델의 로드 및 바인딩 작업을 위한 추가 GPU 큐잉을 피할 수 있습니다.

이제 첫 번째 모델의 평가를 실행하고 그 출력을 다음 모델의 입력에 바인딩합니다.

//run session1 evaluation
session1.EvaluateAsync(binding1, L"");
//bind the output to the next model input
binding2.Bind(input.Name(), outputValue);
//run session2 evaluation
auto session2AsyncOp = session2.EvaluateAsync(binding2, L"");

//run session1 evaluation
await session1.EvaluateAsync(binding1, "");
//bind the output to the next model input
binding2.Bind(input.Name, outputValue);
//run session2 evaluation
LearningModelEvaluationResult results = await session2.EvaluateAsync(binding2, "");

마지막으로 다음 코드 줄을 사용하여 두 모델을 모두 실행한 후 생성된 최종 출력을 검색해보겠습니다.

auto finalOutput = session2AsyncOp.get().Outputs().First().Current().Value();

var finalOutput = results.Outputs.First().Value;

이것으로 끝입니다. 이제 사용 가능한 GPU 리소스를 최대한 활용하여 두 모델 모두를 순차적으로 실행할 수 있습니다.

참고

Windows ML에 대한 도움말은 다음 리소스를 참조하세요.

Windows ML에 대한 기술적인 질문을 하거나 질문에 답하려면, Stack Overflow에서 windows-machine-learning 태그를 사용하세요.
버그를 보고하려면 GitHub에서 문제를 제출하세요.

다음을 통해 공유

체인에서 여러 ML 모델 실행

피드백

추가 리소스