Python용 Azure Cognitive Services Computer Vision SDK

아티클
04/05/2023

Computer Vision 서비스는 개발자에게 이미지를 처리하고 정보를 반환하는 고급 알고리즘에 대한 액세스를 제공합니다. Computer Vision 알고리즘은 관심 있는 시각적 기능에 따라 이미지의 콘텐츠를 다양한 방식으로 분석합니다.

애플리케이션에서 Computer Vision을 사용하여 다음을 수행할 수 있습니다.

인사이트를 위한 이미지 분석
이미지에서 텍스트 추출
미리 보기 생성

자세한 설명서를 찾으시나요?

필수 구성 요소

Computer Vision API 계정이 필요한 경우 다음 Azure CLI 명령을 사용하여 만들 수 있습니다.

RES_REGION=westeurope
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

az cognitiveservices account create \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --location $RES_REGION \
    --kind ComputerVision \
    --sku S1 \
    --yes

설치

가상 환경 내에서 선택적으로 pip를 통해 Azure Cognitive Services Computer Vision SDK를 설치합니다.

가상 환경 구성(선택 사항)

필수는 아니지만, 가상 환경을 사용하는 경우 기본 시스템 및 Azure SDK 환경을 서로 격리할 수 있습니다. 다음 명령을 실행하여 구성한 다음, 와 같은 venvcogsrv-vision-env로 가상 환경을 입력합니다.

python3 -m venv cogsrv-vision-env
source cogsrv-vision-env/bin/activate

SDK 설치

pip를 통해 Python용 Azure Cognitive Services Computer Vision SDK 패키지를 설치합니다.

pip install azure-cognitiveservices-vision-computervision

인증

Computer Vision 리소스를 만들면 클라이언트 개체를 인스턴스화하기 위해 해당 지역 및 해당 계정 키 중 하나가 필요합니다.

이러한 값은 ComputerVisionClient 클라이언트 개체의 인스턴스를 만들 때 사용합니다.

자격 증명 가져오기

아래 Azure CLI 코드 조각을 사용하여 Computer Vision 계정 지역 및 해당 키 중 하나로 두 환경 변수를 채웁니다. (이러한 값은 Azure Portal에서 찾을 수 있습니다.) 조각은 Bash 셸에 대해 서식이 지정됩니다.

RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

export ACCOUNT_REGION=$(az cognitiveservices account show \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query location \
    --output tsv)

export ACCOUNT_KEY=$(az cognitiveservices account keys list \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query key1 \
    --output tsv)

클라이언트 만들기

및 ACCOUNT_KEY 환경 변수를 ACCOUNT_REGION 채웁니다. ComputerVisionClient 클라이언트 개체를 만들 수 있습니다.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

import os
region = os.environ['ACCOUNT_REGION']
key = os.environ['ACCOUNT_KEY']

credentials = CognitiveServicesCredentials(key)
client = ComputerVisionClient(
    endpoint="https://" + region + ".api.cognitive.microsoft.com/",
    credentials=credentials
)

사용

ComputerVisionClient 클라이언트 개체를 초기화한 후에는 다음을 수행할 수 있습니다.

이미지 분석: 얼굴, 색, 태그와 같은 특정 기능에 대한 이미지를 분석할 수 있습니다.
썸네일 생성: 원래 이미지의 썸네일로 사용할 사용자 지정 JPEG 이미지를 만듭니다.
이미지에 대한 설명 가져오기: 주체 도메인을 기반으로 이미지에 대한 설명을 가져옵니다.

이 서비스에 대한 자세한 내용은 Computer Vision이란?을 참조하세요.

예제

다음 섹션에서는 다음을 비롯한 가장 일반적인 Computer Vision 작업 몇 가지에 대한 여러 코드 조각을 제공합니다.

이미지 분석
주체 도메인 목록 가져오기
도메인 기준 이미지 분석
이미지의 텍스트 설명 가져오기
이미지에서 필기한 텍스트 가져오기
썸네일 생성

이미지 분석

analyze_image로 특정 기능에 대한 이미지를 분석할 수 있습니다. visual_features 속성을 사용하여 이미지에서 수행하는 분석의 형식을 설정합니다. 일반 값은 VisualFeatureTypes.tags 및 VisualFeatureTypes.description입니다.

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

image_analysis = client.analyze_image(url,visual_features=[VisualFeatureTypes.tags])

for tag in image_analysis.tags:
    print(tag)

주체 도메인 목록 가져오기

list_models로 이미지를 분석하는 데 사용되는 주체 도메인을 검토합니다. 이러한 도메인 이름은 도메인 기준으로 이미지를 분석하는 경우 사용됩니다. landmarks가 도메인의 한 예입니다.

models = client.list_models()

for x in models.models_property:
    print(x)

도메인 기준 이미지 분석

analyze_image_by_domain으로 주체 도메인 기준으로 이미지를 분석할 수 있습니다. 올바른 도메인 이름을 사용하려면 지원되는 주체 도메인의 목록을 가져옵니다.

domain = "landmarks"
url = "https://images.pexels.com/photos/338515/pexels-photo-338515.jpeg"
language = "en"

analysis = client.analyze_image_by_domain(domain, url, language)

for landmark in analysis.result["landmarks"]:
    print(landmark["name"])
    print(landmark["confidence"])

이미지의 텍스트 설명 가져오기

describe_image로 이미지의 언어 기반 텍스트 설명을 가져올 수 있습니다. 이미지와 연결된 키워드에 대한 텍스트 분석을 수행하는 경우 max_description 속성으로 몇 가지 설명을 요청합니다. 다음 이미지에 대한 텍스트 설명의 예제에는 a train crossing a bridge over a body of water, a large bridge over a body of water 및 a train crossing a bridge over a large body of water가 포함됩니다.

domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

analysis = client.describe_image(url, max_descriptions, language)

for caption in analysis.captions:
    print(caption.text)
    print(caption.confidence)

이미지에서 텍스트 가져오기

이미지에서 필기 또는 인쇄된 텍스트를 가져올 수 있습니다. 이를 위해서는 SDK에 대한 두 호출인 read 및 get_read_result가 필요합니다. 읽기 호출은 비동기입니다. get_read_result 호출의 결과에서 텍스트 데이터를 추출하기 전에 첫 번째 호출이 완료되었는지 OperationStatusCodes 확인해야 합니다. 결과에는 텍스트뿐만 아니라 텍스트에 대한 경계 상자 좌표가 포함됩니다.

# import models
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes

url = "https://github.com/Azure-Samples/cognitive-services-python-sdk-samples/raw/master/samples/vision/images/make_things_happen.jpg"
raw = True
numberOfCharsInOperationId = 36

# SDK call
rawHttpResponse = client.read(url, language="en", raw=True)

# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]

# SDK call
result = client.get_read_result(operationId)

# Get data
if result.status == OperationStatusCodes.succeeded:

    for line in result.analyze_result.read_results[0].lines:
        print(line.text)
        print(line.bounding_box)

썸네일 생성

generate_thumbnail로 이미지의 썸네일(JPG)을 생성할 수 있습니다. 썸네일이 원래 이미지와 동일한 비율일 필요는 없습니다.

이 예제에서는 Pillow 패키지를 사용하여 새 썸네일 이미지를 로컬로 저장합니다.

from PIL import Image
import io

width = 50
height = 50
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"

thumbnail = client.generate_thumbnail(width, height, url)

for x in thumbnail:
    image = Image.open(io.BytesIO(x))

image.save('thumbnail.jpg')

문제 해결

일반

Python SDK를 사용하여 ComputerVisionClient 클라이언트 개체와 상호 작용하는 경우 오류를 반환하는 데 ComputerVisionErrorException 클래스가 사용됩니다. 서비스에서 반환되는 오류는 REST API 요청에 대해 반환되는 동일한 HTTP 상태 코드에 해당합니다.

예를 들어, 잘못된 키를 사용하여 이미지를 분석하려는 경우 401 오류가 반환됩니다. 다음 코드 조각에서는 예외를 catch하고 오류 에 대한 추가 정보를 표시하여 오류를 정상적으로 처리합니다.


domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

try:
    analysis = client.describe_image(url, max_descriptions, language)

    for caption in analysis.captions:
        print(caption.text)
        print(caption.confidence)
except HTTPFailure as e:
    if e.status_code == 401:
        print("Error unauthorized. Make sure your key and region are correct.")
    else:
        raise

재시도를 통한 일시적인 오류 처리

ComputerVisionClient 클라이언트를 사용하는 동안 서비스에서 적용되는 속도 제한 또는 네트워크 중단과 같은 다른 일시적인 문제가 발생할 수도 있습니다. 이러한 유형의 오류를 처리하는 방법에 대한 내용은 클라우드 디자인 패턴 가이드의 다시 시도 패턴 및 관련 회로 차단기 패턴을 참조하세요.

다음 단계

추가 샘플 코드

여러 Computer Vision Python SDK 샘플을 SDK의 GitHub 리포지토리에서 확인할 수 있습니다. 이러한 샘플에는 Computer Vision을 사용하는 동안 흔히 발생하는 추가 시나리오에 대한 예제 코드가 들어 있습니다.

샘플 리포지토리 참조

추가 설명서

Computer Vision 서비스에 대한 더 광범위한 설명서는 docs.microsoft.com에 있는 Azure Computer Vision 설명서를 참조하세요.

다음을 통해 공유