Tutorial: Mengekstrak, memotong, dan menyematkan konten multimodal

Dalam tutorial ini, Anda akan membangun alur pengindeks multimodal yang melakukan tugas-tugas ini:

Mengekstrak dan memotong teks dan gambar
Vektorisasi teks dan gambar untuk pencarian kesamaan
Mengirim gambar yang dipotong ke penyimpanan pengetahuan untuk diambil oleh aplikasi Anda

Tutorial ini menunjukkan beberapa set keterampilan secara berdampingan untuk mengilustrasikan berbagai cara untuk mengekstrak, memotong, dan mem-vektorisasi konten multimodal.

Prasyarat

Pencarian Azure AI, pada tingkat harga dasar atau yang lebih tinggi jika Anda ingin menggunakan data sampel. Konfigurasikan identitas terkelola untuk akses berbasis peran ke model dan data.
Azure Storage, digunakan untuk menyimpan data sampel dan untuk membuat penyimpanan pengetahuan.
Sumber daya Microsoft Foundry yang menyediakan model dan API Foundry. Jika Anda menggunakan multimodal Azure AI Visual, pilih salah satu wilayah yang didukung untuk sumber daya Microsoft Foundry Anda.
Visual Studio Code dengan klien REST atau ekstensi Python. Jika Anda belum menginstal versi Python yang sesuai, ikuti instruksi dalam Tutorial Python Visual Studio Code.

Pengindeksan multimodal diimplementasikan melalui keterampilan yang memanggil model AI dan API dalam alur pengindeks. Prasyarat model bervariasi tergantung pada keterampilan yang dipilih untuk setiap tugas.

Tip

Untuk menyelesaikan tutorial ini pada tingkat gratis, gunakan dokumen yang lebih kecil dengan lebih sedikit gambar. Tutorial ini hanya menggunakan model Foundry, tetapi Anda dapat membuat keterampilan kustom untuk menggunakan model lain.

Mengonfigurasi akses

Sebelum memulai, pastikan Anda memiliki izin untuk mengakses konten dan operasi di Pencarian Azure AI. Panduan memulai cepat ini menggunakan Microsoft Entra ID untuk autentikasi dan akses berbasis peran untuk otorisasi. Anda harus menjadi Pemilik atau Administrator Akses Pengguna untuk menetapkan peran. Jika fungsi peran tidak dapat digunakan, gunakan autentikasi berbasis kunci sebagai gantinya.

Untuk mengonfigurasi akses berbasis peran yang direkomendasikan:

Aktifkan akses berbasis peran untuk layanan pencarian Anda.
Tetapkan peran berikut ke akun pengguna Anda.
- Kontributor Layanan Pencarian
- Kontributor Data Indeks Pencarian
- Pembaca Data Indeks Pencarian

Dapatkan titik akhir

Setiap layanan Pencarian Azure AI memiliki titik akhir, yang merupakan URL unik yang mengidentifikasi dan menyediakan akses jaringan ke layanan. Di bagian selanjutnya, Anda menentukan titik akhir ini untuk menyambungkan ke layanan pencarian Anda secara terprogram.

Untuk mendapatkan titik akhir:

Buka layanan pencarian Anda di portal Microsoft Azure.
Dari panel kiri, pilih Gambaran Umum.
Catat titik akhir, yang akan terlihat seperti https://my-service.search.windows.net.

Siapkan data

Data sampel adalah dokumen PDF 36 halaman yang menggabungkan konten visual kaya, seperti bagan, infografis, dan halaman yang dipindai, dengan teks asli. Azure Storage menyediakan data sampel dan menghosting penyimpanan pengetahuan. Identitas terkelola dari layanan pencarian membutuhkan:

Baca akses ke Azure Storage untuk mengambil data sampel.
Menulis akses untuk membuat penyimpanan pengetahuan. Layanan pencarian membuat kontainer untuk gambar yang dipotong selama pemrosesan skillset, menggunakan nama yang Anda berikan dalam variabel lingkungan.

Ikuti langkah-langkah ini untuk menyiapkan data sampel.

Unduh sampel PDF berikut: sustainable-ai-pdf
Masuk ke portal Azure.
Di Azure Storage, buat kontainer baru bernama sustainable-ai-pdf.
Unggah file data sampel.
Tetapkan peran pada layanan pencarian identitas terkelola:
- Pembaca Blob Data Penyimpanan untuk pengambilan data
- Kontributor Data Blob Penyimpanan dan Kontributor Data Tabel Penyimpanan untuk membuat penyimpanan pengetahuan.

Saat Anda membuka halaman Azure Storage di portal Microsoft Azure, dapatkan string koneksi untuk variabel lingkungan.

Di bawah Pengaturan>Titik Akhir, pilih titik akhir untuk ID Sumber Daya. Ini akan terlihat mirip dengan contoh berikut: /subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/rg-mydemo/providers/Microsoft.Storage/storageAccounts/mydemostorage/blobServices/default.
Tambahkan awalan ResourceId= ke string koneksi ini. Gunakan versi ini untuk variabel lingkungan Anda.

ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/rg-mydemo/providers/Microsoft.Storage/storageAccounts/mydemostorage/blobServices/default

Untuk koneksi yang dibuat menggunakan identitas terkelola yang ditetapkan pengguna, gunakan string koneksi yang sama dan berikan properti yang identity diatur ke identitas terkelola yang ditetapkan pengguna yang telah ditentukan sebelumnya.

"credentials" : { 
    "connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;" 
},
"identity" : { 
    "@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
    "userAssignedIdentity" : "/subscriptions/00000000-0000-0000-0000-00000000/resourcegroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.ManagedIdentity/userAssignedIdentities/MY-DEMO-USER-MANAGED-IDENTITY" 
}

Pilih keterampilan untuk pengindeksan multimodal

Definisi indeks, sumber data, dan pengindeks sebagian besar sama untuk semua skenario, tetapi set keterampilan dapat mencakup kombinasi keterampilan yang berbeda tergantung pada bagaimana Anda ingin mengekstrak, memotong, dan mem-vektorisasi teks dan gambar.

Pilih keterampilan untuk ekstraksi dan potongan:
- Ekstraksi Dokumen, Pemisahan Teks
- Tata Letak Dokumen
Pilih keterampilan untuk vektorisasi:
- Perintah GenAI, Penyematan Azure OpenAI
- Penyematan Multimodal Azure AI Visual

Sebagian besar keterampilan ini bergantung pada model yang disebarkan atau sumber daya Microsoft Foundry. Tabel berikut mengidentifikasi model yang mendukung setiap keterampilan, ditambah sumber daya dan izin yang menyediakan akses model.

Keterampilan	Penggunaan	Model	Sumber Daya	Permissions
Keterampilan Ekstraksi Dokumen, Keterampilan Pemisahan Teks	Ekstrak dan potong berdasarkan ukuran tetap. Ekstraksi teks ini gratis. Ekstraksi gambar dapat ditagih.	Tidak ada (pengaturan default)	Pencarian Azure AI	Lihat Mengonfigurasi akses
Keterampilan Tata Letak Dokumen	Ekstrak dan kelompokkan berdasarkan tata letak dokumen.	Inteligensi Dokumen 4.0	Microsoft Foundry	Pengguna Cognitive Services
Kemampuan Azure AI Visual	Vektorisasi konten teks dan gambar.	Azure AI Visual Multimodal 4.0	Microsoft Foundry	Pengguna Cognitive Services
Kemampuan GenAI Prompt	Panggil LLM untuk menghasilkan deskripsi teks konten gambar.	GPT-5 atau GPT-4	Microsoft Foundry	Pengguna Cognitive Services
Keterampilan penyematan Azure OpenAI	Vektorisasi teks dan deskripsi gambar tekstual yang dihasilkan.	Text-embedding-3 atau text-embedding-ada-002	Microsoft Foundry	Pengguna Cognitive Services

Penggunaan model dapat ditagih, kecuali untuk ekstraksi teks dan pemisahan teks.

Penyebaran model dapat dilakukan di wilayah yang didukung jika layanan pencarian terhubung melalui titik akhir publik, koneksi privat, atau jika koneksi penagihan tidak memerlukan kunci. Sebagai alternatif, jika koneksi berbasis kunci, lampirkan sumber daya Microsoft Foundry dari wilayah yang sama dengan Pencarian Azure AI.

Atur lingkungan Anda

Untuk tutorial ini, koneksi klien REST lokal Anda ke Pencarian Azure AI memerlukan titik akhir dan kunci API. Anda bisa mendapatkan nilai-nilai ini di portal Microsoft Azure. Untuk metode koneksi lainnya, lihat Menyambungkan ke layanan pencarian.

Untuk koneksi terautentikasi yang terjadi selama pengindeksan dan pemrosesan fungsi keahlian, layanan pencarian menggunakan penetapan peran yang telah Anda tentukan sebelumnya.

Mulai Visual Studio Code dan buat file baru.
Berikan nilai untuk variabel yang digunakan dalam permintaan:
```
 @searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
 @storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
 @imageProjectionContainer=sustainable-ai-pdf-images
 @token = PUT-YOUR-PERSONAL-IDENTITY-TOKEN HERE
```
Untuk @storageConnection, pastikan string koneksi Anda tidak memiliki tanda titik koma atau tanda kutip di akhir. Lihat Menyiapkan data Anda untuk sintaks string koneksi.

Untuk @imageProjectionContainer, berikan nama kontainer yang unik dalam penyimpanan blob. Pencarian Azure AI membuat kontainer ini selama pemrosesan kemampuan.

Untuk bantuan mendapatkan token akses, lihat Menyambungkan ke Pencarian Azure AI. Jika Anda tidak dapat menggunakan peran, lihat Menyambungkan dengan kunci.
Tambahkan variabel ini jika Anda menggunakan keterampilan Tata Letak Dokumen atau keterampilan Azure AI Visual (menggunakan model versi 2023-04-15):
```
@foundryUrl = PUT-YOUR-MULTISERVICE-AZURE-AI-FOUNDRY-ENDPOINT-HERE
@azureAiVisionModelVersion = 2023-04-15
```

Tambahkan variabel ini jika Anda menggunakan keterampilan GenAI Prompt dan keterampilan Azure OpenAI Embedding:

 @chatCompletionModelUri = PUT-YOUR-DEPLOYED-MODEL-URI-HERE
 @chatCompletionModelKey = PUT-YOUR-MODEL-KEY-HERE
 @textEmbeddingModelUri = PUT-YOUR-DEPLOYED-MODEL-URI-HERE
 @textEmbeddingModelKey = PUT-YOUR-MODEL-KEY-HERE

Simpan file menggunakan .rest ekstensi file atau .http . Untuk bantuan dengan klien REST, lihat Mulai Cepat: Pencarian teks lengkap menggunakan REST.

Sumber daya Foundry yang sama dapat menyediakan Azure AI Visual, Kecerdasan Dokumen, model penyelesaian obrolan, dan model penyematan teks. Pastikan wilayah mendukung model yang Anda butuhkan. Jika suatu wilayah dalam kapasitas, Anda mungkin perlu membuat sumber daya baru untuk menyebarkan model yang diperlukan.

Menyiapkan alur

Alur pengindeks terdiri dari empat komponen: sumber data, indeks, set keterampilan, dan pengindeks.

Membuat sumber data
Membuat indeks
Membuat kumpulan keterampilan untuk ekstraksi, pengelompokan, dan vektorisasi
Membuat (dan menjalankan) pengindeks

Mengunduh file REST

Repositori GitHub azure-search-rest-samples memiliki . File REST yang membuat alur dan mengkueri indeks.

Tip

Lihat repositori GitHub azure-ai-search-multimodal-sample untuk contoh Python.

Membuat sumber data

Buat Sumber Data (REST) membuat koneksi sumber data yang menentukan data apa yang akan diindeks.

POST {{searchUrl}}/datasources?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}

{
   "name":"demo-multimodal-ds",
   "description":null,
   "type":"azureblob",
   "subtype":null,
   "credentials":{
      "connectionString":"{{storageConnection}}"
   },
   "container":{
      "name":"sustainable-ai-pdf",
      "query":null
   },
   "dataChangeDetectionPolicy":null,
   "dataDeletionDetectionPolicy":null,
   "encryptionKey":null,
   "identity":null
}

Kirim permintaan. Respons akan terlihat seperti ini:

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('demo-multimodal-ds')?api-version=2025-11-01-preview -Preview
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
Preference-Applied: odata.include-annotations="*"
OData-Version: 4.0
request-id: 4eb8bcc3-27b5-44af-834e-295ed078e8ed
elapsed-time: 346
Date: Sat, 26 Apr 2026 21:25:24 GMT
Connection: close

{
  "name": "demo-multimodal-ds",
  "description": null,
  "type": "azureblob",
  "subtype": null,
  "indexerPermissionOptions": [],
  "credentials": {
    "connectionString": null
  },
  "container": {
    "name": "sustainable-ai-pdf",
    "query": null
  },
  "dataChangeDetectionPolicy": null,
  "dataDeletionDetectionPolicy": null,
  "encryptionKey": null,
  "identity": null
}

Buat indeks

Buat Indeks (REST) membuat indeks di layanan pencarian Anda. Indeks ini mirip di semua set keterampilan, dengan pengecualian berikut:

Bagian vectorizers ini menentukan bagaimana teks kueri divektorisasi pada waktu pencarian. Penting untuk menggunakan penyedia penyematan dan keluarga model yang sama yang digunakan oleh skillset (multimodal Azure AI Visual atau penyematan teks Azure OpenAI) sehingga vektor kueri dan vektor terindeks kompatibel.
Nilai content_embedding bidang dimensions harus sama persis dengan ukuran vektor yang dihasilkan oleh model penyematan (misalnya, 1024 untuk multimodal Azure AI Visual atau 3072 untuk text-embedding-3-large). Ketidakcocokan dapat menyebabkan kegagalan pengindeksan atau kueri.
Untuk jenis kompleks, nama bidang berlapis dalam indeks harus sama persis dengan nama output pengayaan (termasuk casing). Pencarian Azure AI tidak dapat memetakan subbidang bertingkat ke nama yang berbeda. Gunakan location_metadata, bounding_polygons, dan page_number untuk bidang yang menerima output Pemisahan Teks, dan locationMetadata, boundingPolygons, dan pageNumber untuk bidang yang menerima output Tata Letak Dokumen.

Berikut adalah definisi indeks untuk setiap kombinasi keterampilan.

Pola ini menggunakan:

Keterampilan Ekstraksi Dokumen dan keterampilan Pemisahan Teks untuk ekstraksi dan pemotongan.
Keterampilan multimodal Azure AI Visual untuk penyematan teks dan gambar.

{
   "name":"demo-multimodal-1-index",
   "fields":[
      {
         "name":"content_id",
         "type":"Edm.String",
         "retrievable":true,
         "key":true,
         "analyzer":"keyword"
      },
      {
         "name":"text_document_id",
         "type":"Edm.String",
         "searchable":false,
         "filterable":true,
         "retrievable":true,
         "stored":true,
         "sortable":false,
         "facetable":false
      },
      {
         "name":"document_title",
         "type":"Edm.String",
         "searchable":true
      },
      {
         "name":"image_document_id",
         "type":"Edm.String",
         "filterable":true,
         "retrievable":true
      },
      {
         "name":"content_text",
         "type":"Edm.String",
         "searchable":true,
         "retrievable":true
      },
      {
         "name":"content_embedding",
         "type":"Collection(Edm.Single)",
         "dimensions":1024,
         "searchable":true,
         "retrievable":true,
         "vectorSearchProfile":"hnsw"
      },
      {
         "name":"content_path",
         "type":"Edm.String",
         "searchable":false,
         "retrievable":true
      },
      {
         "name":"location_metadata",
         "type":"Edm.ComplexType",
         "fields":[
            {
               "name":"page_number",
               "type":"Edm.Int32",
               "searchable":false,
               "retrievable":true
            },
            {
               "name":"bounding_polygons",
               "type":"Edm.String",
               "searchable":false,
               "retrievable":true,
               "filterable":false,
               "sortable":false,
               "facetable":false
            }
         ]
      }
   ],
   "vectorSearch":{
      "profiles":[
         {
            "name":"hnsw",
            "algorithm":"defaulthnsw",
            "vectorizer":"demo-vectorizer"
         }
      ],
      "algorithms":[
         {
            "name":"defaulthnsw",
            "kind":"hnsw",
            "hnswParameters":{
               "m":4,
               "efConstruction":400,
               "metric":"cosine"
            }
         }
      ],
      "vectorizers":[
         {
            "name":"demo-vectorizer",
            "kind":"aiServicesVision",
            "aiServicesVisionParameters":{
               "resourceUri":"{{foundryUrl}}",
               "authIdentity":null,
               "modelVersion":"{{azureAiVisionModelVersion}}"
            }
         }
      ]
   },
   "semantic":{
      "defaultConfiguration":"semanticconfig",
      "configurations":[
         {
            "name":"semanticconfig",
            "prioritizedFields":{
               "titleField":{
                  "fieldName":"document_title"
               },
               "prioritizedContentFields":[
                  
               ],
               "prioritizedKeywordsFields":[
                  
               ]
            }
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Ekstraksi Dokumen dan keterampilan Pemisahan Teks untuk ekstraksi dan pemotongan.
Keterampilan GenAI Prompt dan keterampilan penyematan Azure OpenAI untuk deskripsi tekstual gambar dan penyematan teks.

{
   "name":"demo-multimodal-2-index",
   "fields":[
      {
         "name":"content_id",
         "type":"Edm.String",
         "retrievable":true,
         "key":true,
         "analyzer":"keyword"
      },
      {
         "name":"text_document_id",
         "type":"Edm.String",
         "searchable":false,
         "filterable":true,
         "retrievable":true,
         "stored":true,
         "sortable":false,
         "facetable":false
      },
      {
         "name":"document_title",
         "type":"Edm.String",
         "searchable":true
      },
      {
         "name":"image_document_id",
         "type":"Edm.String",
         "filterable":true,
         "retrievable":true
      },
      {
         "name":"content_text",
         "type":"Edm.String",
         "searchable":true,
         "retrievable":true
      },
      {
         "name":"content_embedding",
         "type":"Collection(Edm.Single)",
         "dimensions":3072,
         "searchable":true,
         "retrievable":true,
         "vectorSearchProfile":"hnsw"
      },
      {
         "name":"content_path",
         "type":"Edm.String",
         "searchable":false,
         "retrievable":true
      },
      {
         "name":"location_metadata",
         "type":"Edm.ComplexType",
         "fields":[
            {
               "name":"page_number",
               "type":"Edm.Int32",
               "searchable":false,
               "retrievable":true
            },
            {
               "name":"bounding_polygons",
               "type":"Edm.String",
               "searchable":false,
               "retrievable":true,
               "filterable":false,
               "sortable":false,
               "facetable":false
            }
         ]
      }
   ],
   "vectorSearch":{
      "profiles":[
         {
            "name":"hnsw",
            "algorithm":"defaulthnsw",
            "vectorizer":"demo-vectorizer"
         }
      ],
      "algorithms":[
         {
            "name":"defaulthnsw",
            "kind":"hnsw",
            "hnswParameters":{
               "m":4,
               "efConstruction":400,
               "metric":"cosine"
            }
         }
      ],
      "vectorizers":[
         {
            "name":"demo-vectorizer",
            "kind":"azureOpenAI",
            "azureOpenAIParameters":{
               "resourceUri": "{{textEmbeddingModelUri}}",
               "apiKey": "{{textEmbeddingModelKey}}",
               "deploymentId":"{{textEmbeddingDeploymentId}}",
               "modelName":"{{textEmbeddingModelName}}"
            }
         }
      ]
   },
   "semantic":{
      "defaultConfiguration":"semanticconfig",
      "configurations":[
         {
            "name":"semanticconfig",
            "prioritizedFields":{
               "titleField":{
                  "fieldName":"document_title"
               },
               "prioritizedContentFields":[
                  
               ],
               "prioritizedKeywordsFields":[
                  
               ]
            }
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Tata Letak Dokumen untuk ekstraksi dan pemotongan.
Keterampilan multimodal Azure AI Visual untuk penyematan teks dan gambar.

{
   "name":"demo-multimodal-3-index",
   "fields":[
      {
         "name":"content_id",
         "type":"Edm.String",
         "retrievable":true,
         "key":true,
         "analyzer":"keyword"
      },
      {
         "name":"text_document_id",
         "type":"Edm.String",
         "searchable":false,
         "filterable":true,
         "retrievable":true,
         "stored":true,
         "sortable":false,
         "facetable":false
      },
      {
         "name":"document_title",
         "type":"Edm.String",
         "searchable":true
      },
      {
         "name":"image_document_id",
         "type":"Edm.String",
         "filterable":true,
         "retrievable":true
      },
      {
         "name":"content_text",
         "type":"Edm.String",
         "searchable":true,
         "retrievable":true
      },
      {
         "name":"content_embedding",
         "type":"Collection(Edm.Single)",
         "dimensions":1024,
         "searchable":true,
         "retrievable":true,
         "vectorSearchProfile":"hnsw"
      },
      {
         "name":"content_path",
         "type":"Edm.String",
         "searchable":false,
         "retrievable":true
      },
      {
         "name":"locationMetadata",
         "type":"Edm.ComplexType",
         "fields":[
            {
               "name":"pageNumber",
               "type":"Edm.Int32",
               "searchable":false,
               "retrievable":true
            },
            {
               "name":"boundingPolygons",
               "type":"Edm.String",
               "searchable":false,
               "retrievable":true,
               "filterable":false,
               "sortable":false,
               "facetable":false
            }
         ]
      }
   ],
   "vectorSearch":{
      "profiles":[
         {
            "name":"hnsw",
            "algorithm":"defaulthnsw",
            "vectorizer":"demo-vectorizer"
         }
      ],
      "algorithms":[
         {
            "name":"defaulthnsw",
            "kind":"hnsw",
            "hnswParameters":{
               "m":4,
               "efConstruction":400,
               "metric":"cosine"
            }
         }
      ],
      "vectorizers":[
         {
            "name":"demo-vectorizer",
            "kind":"aiServicesVision",
            "aiServicesVisionParameters":{
               "resourceUri":"{{foundryUrl}}",
               "authIdentity":null,
               "modelVersion":"{{azureAiVisionModelVersion}}"
            }
         }
      ]
   },
   "semantic":{
      "defaultConfiguration":"semanticconfig",
      "configurations":[
         {
            "name":"semanticconfig",
            "prioritizedFields":{
               "titleField":{
                  "fieldName":"document_title"
               },
               "prioritizedContentFields":[
                  
               ],
               "prioritizedKeywordsFields":[
                  
               ]
            }
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Tata Letak Dokumen untuk ekstraksi dan pemotongan.
Keterampilan GenAI Prompt dan keterampilan penyematan Azure OpenAI untuk deskripsi tekstual gambar dan penyematan teks.

{
   "name":"demo-multimodal-4-index",
   "fields":[
      {
         "name":"content_id",
         "type":"Edm.String",
         "retrievable":true,
         "key":true,
         "analyzer":"keyword"
      },
      {
         "name":"text_document_id",
         "type":"Edm.String",
         "searchable":false,
         "filterable":true,
         "retrievable":true,
         "stored":true,
         "sortable":false,
         "facetable":false
      },
      {
         "name":"document_title",
         "type":"Edm.String",
         "searchable":true
      },
      {
         "name":"image_document_id",
         "type":"Edm.String",
         "filterable":true,
         "retrievable":true
      },
      {
         "name":"content_text",
         "type":"Edm.String",
         "searchable":true,
         "retrievable":true
      },
      {
         "name":"content_embedding",
         "type":"Collection(Edm.Single)",
         "dimensions":3072,
         "searchable":true,
         "retrievable":true,
         "vectorSearchProfile":"hnsw"
      },
      {
         "name":"content_path",
         "type":"Edm.String",
         "searchable":false,
         "retrievable":true
      },
      {
         "name":"locationMetadata",
         "type":"Edm.ComplexType",
         "fields":[
            {
               "name":"pageNumber",
               "type":"Edm.Int32",
               "searchable":false,
               "retrievable":true
            },
            {
               "name":"boundingPolygons",
               "type":"Edm.String",
               "searchable":false,
               "retrievable":true,
               "filterable":false,
               "sortable":false,
               "facetable":false
            }
         ]
      }
   ],
   "vectorSearch":{
      "profiles":[
         {
            "name":"hnsw",
            "algorithm":"defaulthnsw",
            "vectorizer":"demo-vectorizer"
         }
      ],
      "algorithms":[
         {
            "name":"defaulthnsw",
            "kind":"hnsw",
            "hnswParameters":{
               "m":4,
               "efConstruction":400,
               "metric":"cosine"
            }
         }
      ],
      "vectorizers":[
         {
            "name":"demo-vectorizer",
            "kind":"azureOpenAI",
            "azureOpenAIParameters":{
               "resourceUri":"{{textEmbeddingModelUri}}",
               "deploymentId":"text-embedding-3-large",
               "apiKey":"{{textEmbeddingModelKey}}",
               "modelName":"text-embedding-3-large"
            }
         }
      ]
   },
   "semantic":{
      "defaultConfiguration":"semanticconfig",
      "configurations":[
         {
            "name":"semanticconfig",
            "prioritizedFields":{
               "titleField":{
                  "fieldName":"document_title"
               },
               "prioritizedContentFields":[
                  
               ],
               "prioritizedKeywordsFields":[
                  
               ]
            }
         }
      ]
   }
}

Poin utama:

content_embedding adalah satu-satunya bidang vektor dan menyimpan vektor untuk konten teks dan gambar. Ini harus dikonfigurasi dengan dimensi yang sesuai untuk model penyematan, seperti 3072 untuk penyematan teks-3-besar, serta profil pencarian vektor.
content_path adalah jalur setiap gambar di penyimpanan pengetahuan.
location_metadata atau locationMetadata menangkap metadata poligon pembatas dan nomor halaman untuk setiap gambar yang dinormalisasi, memungkinkan pencarian spasial yang tepat atau overlay UI. Nama bidang bervariasi berdasarkan bagaimana informasi diekstrak.
Untuk ekstraksi konten berdasarkan keterampilan Pemisahan Teks: metadata lokasi hanya didukung untuk file PDF. Selain itu, untuk keterampilan Pemisahan Teks, Anda harus menyertakan keterampilan Shaper untuk menangkap metadata lokasi dalam memori dan mewakilinya di pohon dokumen. Kemampuan Shaper juga bertanggung jawab untuk menambah nama kontainer pengetahuan ke content_path.

Membuat kumpulan keterampilan untuk ekstraksi, pengelompokan, dan vektorisasi

Buat Skillset (REST) membuat set keterampilan di layanan pencarian Anda. Set keterampilan mendefinisikan operasi yang mengekstrak, memotong, dan mem-vektorisasi konten sebelum pengindeksan.

Ada empat pola keterampilan. Masing-masing menunjukkan strategi ekstraksi dan potongan, dipasangkan dengan strategi vektorisasi. Ada dua perbedaan utama dalam setiap pola: komposisi keterampilan dan indexProjections. Proyeksi bervariasi berdasarkan output dari setiap keterampilan penyematan.

Keempat pola tersebut mencakup kemampuan Shaper. Output dari fitur Shaper menciptakan jalur gambar yang dinormalisasi di penyimpanan pengetahuan dan metadata lokasi (nomor halaman dan poligon pembatas).

Pola ini menggunakan:

Keterampilan Ekstraksi Dokumen dan keterampilan Pemisahan Teks untuk ekstraksi dan pemotongan.
Keterampilan multimodal Azure AI Visual untuk penyematan teks dan gambar.
Shaper skill menangkap metadata lokasi dan nama kontainer untuk path file gambar di knowledge store. Kemampuan ini unik untuk konten PDF dan Ekstraksi Dokumen.

{
   "name":"demo-multimodal-skillset",
   "description":"A test skillset",
   "skills":[
      {
         "@odata.type":"#Microsoft.Skills.Util.DocumentExtractionSkill",
         "name":"document-extraction-skill",
         "description":"Document extraction skill to extract text and images from documents",
         "parsingMode":"default",
         "dataToExtract":"contentAndMetadata",
         "configuration":{
            "imageAction":"generateNormalizedImages",
            "normalizedImageMaxWidth":2000,
            "normalizedImageMaxHeight":2000
         },
         "context":"/document",
         "inputs":[
            {
               "name":"file_data",
               "source":"/document/file_data"
            }
         ],
         "outputs":[
            {
               "name":"content",
               "targetName":"extracted_content"
            },
            {
               "name":"normalized_images",
               "targetName":"normalized_images"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Text.SplitSkill",
         "name":"split-skill",
         "description":"Split skill to chunk documents",
         "context":"/document",
         "defaultLanguageCode":"en",
         "textSplitMode":"pages",
         "maximumPageLength":2000,
         "pageOverlapLength":200,
         "unit":"characters",
         "inputs":[
            {
               "name":"text",
               "source":"/document/extracted_content",
               "inputs":[
                  
               ]
            }
         ],
         "outputs":[
            {
               "name":"textItems",
               "targetName":"pages"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Vision.VectorizeSkill",
         "name":"text-embedding-skill",
         "description":"Vision Vectorization skill for text",
         "context":"/document/pages/*",
         "modelVersion":"{{azureAiVisionModelVersion}}",
         "inputs":[
            {
               "name":"text",
               "source":"/document/pages/*"
            }
         ],
         "outputs":[
            {
               "name":"vector",
               "targetName":"text_vector"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Vision.VectorizeSkill",
         "name":"image-embedding-skill",
         "description":"Vision Vectorization skill for images",
         "context":"/document/normalized_images/*",
         "modelVersion":"{{azureAiVisionModelVersion}}",
         "inputs":[
            {
               "name":"image",
               "source":"/document/normalized_images/*"
            }
         ],
         "outputs":[
            {
               "name":"vector",
               "targetName":"image_vector"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Util.ShaperSkill",
         "name":"shaper-skill",
         "description":"Shaper skill to reshape the data to fit the index schema",
         "context":"/document/normalized_images/*",
         "inputs":[
            {
               "name":"normalized_images",
               "source":"/document/normalized_images/*",
               "inputs":[
                  
               ]
            },
            {
               "name":"imagePath",
               "source":"='{{imageProjectionContainer}}/'+$(/document/normalized_images/*/imagePath)",
               "inputs":[
                  
               ]
            },
            {
               "name":"dataUri",
               "source":"='data:image/jpeg;base64,'+$(/document/normalized_images/*/data)",
               "inputs":[
                  
               ]
            },
            {
               "name":"location_metadata",
               "sourceContext":"/document/normalized_images/*",
               "inputs":[
                  {
                     "name":"page_number",
                     "source":"/document/normalized_images/*/page_number"
                  },
                  {
                     "name":"bounding_polygons",
                     "source":"/document/normalized_images/*/bounding_polygon"
                  }
               ]
            }
         ],
         "outputs":[
            {
               "name":"output",
               "targetName":"new_normalized_images"
            }
         ]
      }
   ],
   "cognitiveServices":{
      "@odata.type":"#Microsoft.Azure.Search.AIServicesByIdentity",
      "subdomainUrl":"{{foundryUrl}}",
      "identity":null
   },
   "indexProjections":{
      "selectors":[
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"text_document_id",
            "sourceContext":"/document/pages/*",
            "mappings":[
               {
                  "name":"content_embedding",
                  "source":"/document/pages/*/text_vector"
               },
               {
                  "name":"content_text",
                  "source":"/document/pages/*"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               }
            ]
         },
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"image_document_id",
            "sourceContext":"/document/normalized_images/*",
            "mappings":[
               {
                  "name":"content_embedding",
                  "source":"/document/normalized_images/*/image_vector"
               },
               {
                  "name":"content_path",
                  "source":"/document/normalized_images/*/new_normalized_images/imagePath"
               },
               {
                  "name":"location_metadata",
                  "source":"/document/normalized_images/*/new_normalized_images/location_metadata"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               }
            ]
         }
      ],
      "parameters":{
         "projectionMode":"skipIndexingParentDocuments"
      }
   },
   "knowledgeStore":{
      "storageConnectionString":"{{storageConnection}}",
      "identity":null,
      "projections":[
         {
            "files":[
               {
                  "storageContainer":"{{imageProjectionContainer}}",
                  "source":"/document/normalized_images/*"
               }
            ]
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Ekstraksi Dokumen dan keterampilan Pemisahan Teks untuk ekstraksi dan pemotongan.
Keterampilan GenAI Prompt dan keterampilan penyematan Azure OpenAI untuk deskripsi tekstual gambar dan penyematan teks.
Shaper skill menangkap metadata lokasi dan nama kontainer untuk path file gambar di knowledge store. Kemampuan ini unik untuk konten PDF dan Ekstraksi Dokumen.

{
   "name":"demo-multimodal-skillset",
   "description":"A test skillset",
   "skills":[
      {
         "@odata.type":"#Microsoft.Skills.Util.DocumentExtractionSkill",
         "name":"document-extraction-skill",
         "description":"Document extraction skill to extract text and images from documents",
         "parsingMode":"default",
         "dataToExtract":"contentAndMetadata",
         "configuration":{
            "imageAction":"generateNormalizedImages",
            "normalizedImageMaxWidth":2000,
            "normalizedImageMaxHeight":2000
         },
         "context":"/document",
         "inputs":[
            {
               "name":"file_data",
               "source":"/document/file_data"
            }
         ],
         "outputs":[
            {
               "name":"content",
               "targetName":"extracted_content"
            },
            {
               "name":"normalized_images",
               "targetName":"normalized_images"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Text.SplitSkill",
         "name":"split-skill",
         "description":"Split skill to chunk documents",
         "context":"/document",
         "defaultLanguageCode":"en",
         "textSplitMode":"pages",
         "maximumPageLength":2000,
         "pageOverlapLength":200,
         "unit":"characters",
         "inputs":[
            {
               "name":"text",
               "source":"/document/extracted_content",
               "inputs":[
                  
               ]
            }
         ],
         "outputs":[
            {
               "name":"textItems",
               "targetName":"pages"
            }
         ]
      },
      {
        "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
        "name": "#2",
        "context": "/document/pages/*",
        "resourceUri": "{{textEmbeddingModelUri}}",
        "apiKey": "{{textEmbeddingModelKey}}",
        "deploymentId":"{{textEmbeddingDeploymentId}}",
        "modelName":"{{textEmbeddingModelName}}",
        "dimensions": 3072,
        "inputs": [
          {
            "name": "text",
            "source": "/document/pages/*",
            "inputs": []
          }
        ],
        "outputs": [
          {
            "name": "embedding",
            "targetName": "text_vector"
          }
        ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
      "name": "genAI-prompt-skill",
      "description": "GenAI Prompt skill for image verbalization",
      "uri": "{{chatCompletionModelUri}}",
      "apiKey": "{{chatCompletionModelKey}}",
      "context": "/document/normalized_images/*",
      "responseFormat": { "type": "text" },
      "inputs": [
          {
          "name": "systemMessage",
          "source": "='You are tasked with generating concise, accurate descriptions of images, figures, diagrams, or charts in documents. The goal is to capture the key information and meaning conveyed by the image without including extraneous details like style, colors, visual aesthetics, or size.\n\nInstructions:\nContent Focus: Describe the core content and relationships depicted in the image.\n\nFor diagrams, specify the main elements and how they are connected or interact.\nFor charts, highlight key data points, trends, comparisons, or conclusions.\nFor figures or technical illustrations, identify the components and their significance.\nClarity & Precision: Use concise language to ensure clarity and technical accuracy. Avoid subjective or interpretive statements.\n\nAvoid Visual Descriptors: Exclude details about:\n\nColors, shading, and visual styles.\nImage size, layout, or decorative elements.\nFonts, borders, and stylistic embellishments.\nContext: If relevant, relate the image to the broader content of the technical document or the topic it supports.\n\nExample Descriptions:\nDiagram: \"A flowchart showing the four stages of a machine learning pipeline: data collection, preprocessing, model training, and evaluation, with arrows indicating the sequential flow of tasks.\"\n\nChart: \"A bar chart comparing the performance of four algorithms on three datasets, showing that Algorithm A consistently outperforms the others on Dataset 1.\"\n\nFigure: \"A labeled diagram illustrating the components of a transformer model, including the encoder, decoder, self-attention mechanism, and feedforward layers.\"'"
          },
          {
          "name": "userMessage",
          "source": "='Please describe this image.'"
          },
          {
          "name": "image",
          "source": "/document/normalized_images/*/data"
          }
          ],
          "outputs": [
              {
              "name": "response",
              "targetName": "verbalizedImage"
              }
          ]
      },    
      {
        "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
        "name": "verbalized-image-embedding-skill",
        "description": "Embedding skill for verbalized images",
        "resourceUri": "{{textEmbeddingModelUri}}",
        "apiKey": "{{textEmbeddingModelKey}}",
        "deploymentId":"{{textEmbeddingDeploymentId}}",
        "modelName":"{{textEmbeddingModelName}}",
        "dimensions": 3072,
        "context": "/document/normalized_images/*",
        "inputs": [
            {
            "name": "text",
            "source": "/document/normalized_images/*/verbalizedImage",
            "inputs": []
            }
        ],
        "outputs": [
            {
            "name": "embedding",
            "targetName": "verbalizedImage_vector"
            }
        ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shaper-skill",
      "description": "Shaper skill to reshape the data to fit the index schema",
      "context": "/document/normalized_images/*",
      "inputs": [
        {
          "name": "normalized_images",
          "source": "/document/normalized_images/*",
          "inputs": []
        },
        {
          "name": "imagePath",
          "source": "='{{imageProjectionContainer}}/'+$(/document/normalized_images/*/imagePath)",
          "inputs": []
        },
        {
          "name": "location_metadata",
          "sourceContext": "/document/normalized_images/*",
          "inputs": [
            {
              "name": "page_number",
              "source": "/document/normalized_images/*/page_number"
            },
            {
              "name": "bounding_polygons",
              "source": "/document/normalized_images/*/bounding_polygon"
            }              
          ]
        }        
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "new_normalized_images"
        }
      ]
   }
   ],
  "indexProjections": {
      "selectors": [
        {
          "targetIndexName": "demo-multimodal-index",
          "parentKeyFieldName": "text_document_id",
          "sourceContext": "/document/pages/*",
          "mappings": [              
            {
              "name": "content_embedding",
              "source": "/document/pages/*/text_vector"
            },
            {
              "name": "content_text",
              "source": "/document/pages/*"
            },             
            {
              "name": "document_title",
              "source": "/document/document_title"
            }      
          ]
        },
        {
          "targetIndexName": "demo-multimodal-index",
          "parentKeyFieldName": "image_document_id",
          "sourceContext": "/document/normalized_images/*",
          "mappings": [    
            {
            "name": "content_text",
            "source": "/document/normalized_images/*/verbalizedImage"
            },  
            {
            "name": "content_embedding",
            "source": "/document/normalized_images/*/verbalizedImage_vector"
            },                                           
            {
              "name": "content_path",
              "source": "/document/normalized_images/*/new_normalized_images/imagePath"
            },                    
            {
              "name": "document_title",
              "source": "/document/document_title"
            },
            {
              "name": "location_metadata",
              "source": "/document/normalized_images/*/new_normalized_images/location_metadata"
            }            
          ]
        }
      ],
      "parameters": {
        "projectionMode": "skipIndexingParentDocuments"
      }
  },
   "knowledgeStore":{
      "storageConnectionString":"{{storageConnection}}",
      "identity":null,
      "projections":[
         {
            "files":[
               {
                  "storageContainer":"{{imageProjectionContainer}}",
                  "source":"/document/normalized_images/*"
               }
            ]
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Tata Letak Dokumen untuk ekstraksi dan pemotongan.
Keterampilan multimodal Azure AI Visual untuk penyematan teks dan gambar.

{
   "name":"demo-multimodal-skillset",
   "description":"A test skillset",
   "skills":[
      {
         "@odata.type":"#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
         "name":"document-layout-skill",
         "description":"Document Intelligence skill for document cracking",
         "context":"/document",
         "outputMode":"oneToMany",
         "outputFormat":"text",
         "extractionOptions":[
            "images",
            "locationMetadata"
         ],
         "chunkingProperties":{
            "unit":"characters",
            "maximumLength":2000,
            "overlapLength":200
         },
         "inputs":[
            {
               "name":"file_data",
               "source":"/document/file_data"
            }
         ],
         "outputs":[
            {
               "name":"text_sections",
               "targetName":"text_sections"
            },
            {
               "name":"normalized_images",
               "targetName":"normalized_images"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Vision.VectorizeSkill",
         "name":"text-embedding-skill",
         "description":"Vision Vectorization skill for text",
         "context":"/document/text_sections/*",
         "modelVersion":"{{azureAiVisionModelVersion}}",
         "inputs":[
            {
               "name":"text",
               "source":"/document/text_sections/*/content"
            }
         ],
         "outputs":[
            {
               "name":"vector",
               "targetName":"text_vector"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Vision.VectorizeSkill",
         "name":"image-embedding-skill",
         "description":"Vision Vectorization skill for images",
         "context":"/document/normalized_images/*",
         "modelVersion":"{{azureAiVisionModelVersion}}",
         "inputs":[
            {
               "name":"image",
               "source":"/document/normalized_images/*"
            }
         ],
         "outputs":[
            {
               "name":"vector",
               "targetName":"image_vector"
            }
         ]
      }
   ],
   "cognitiveServices":{
      "@odata.type":"#Microsoft.Azure.Search.AIServicesByIdentity",
      "subdomainUrl":"{{foundryUrl}}",
      "identity":null
   },
   "indexProjections":{
      "selectors":[
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"text_document_id",
            "sourceContext":"/document/text_sections/*",
            "mappings":[
               {
                  "name":"content_embedding",
                  "source":"/document/text_sections/*/text_vector"
               },
               {
                  "name":"content_text",
                  "source":"/document/text_sections/*/content"
               },
               {
                  "name":"locationMetadata",
                  "source":"/document/text_sections/*/locationMetadata"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               }
            ]
         },
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"image_document_id",
            "sourceContext":"/document/normalized_images/*",
            "mappings":[
               {
                  "name":"content_embedding",
                  "source":"/document/normalized_images/*/image_vector"
               },
               {
                  "name":"content_path",
                  "source":"/document/normalized_images/*/imagePath"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               },
               {
                  "name":"locationMetadata",
                  "source":"/document/normalized_images/*/locationMetadata"
               }
            ]
         }
      ],
      "parameters":{
         "projectionMode":"skipIndexingParentDocuments"
      }
   },
   "knowledgeStore":{
      "storageConnectionString":"{{storageConnection}}",
      "projections":[
         {
            "files":[
               {
                  "storageContainer":"{{imageProjectionContainer}}",
                  "source":"/document/normalized_images/*"
               }
            ]
         }
      ]
   }
}

Pola ini menggunakan:

Keterampilan Tata Letak Dokumen untuk ekstraksi dan pemotongan.
Keterampilan GenAI Prompt dan keterampilan penyematan Azure OpenAI untuk deskripsi tekstual gambar dan penyematan teks.

{
   "name":"demo-multimodal-skillset",
   "description":"A test skillset",
   "skills":[
      {
         "@odata.type":"#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
         "name":"document-cracking-skill",
         "description":"Document Layout skill for document cracking",
         "context":"/document",
         "outputMode":"oneToMany",
         "outputFormat":"text",
         "extractionOptions":[
            "images",
            "locationMetadata"
         ],
         "chunkingProperties":{
            "unit":"characters",
            "maximumLength":2000,
            "overlapLength":200
         },
         "inputs":[
            {
               "name":"file_data",
               "source":"/document/file_data"
            }
         ],
         "outputs":[
            {
               "name":"text_sections",
               "targetName":"text_sections"
            },
            {
               "name":"normalized_images",
               "targetName":"normalized_images"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
         "name":"text-embedding-skill",
         "description":"Embedding skill for text",
         "context":"/document/text_sections/*",
         "inputs":[
            {
               "name":"text",
               "source":"/document/text_sections/*/content"
            }
         ],
         "outputs":[
            {
               "name":"embedding",
               "targetName":"text_vector"
            }
         ],
         "resourceUri":"{{textEmbeddingModelUri}}",
         "deploymentId":"text-embedding-3-large",
         "apiKey":"{{textEmbeddingModelKey}}",
         "dimensions":3072,
         "modelName":"text-embedding-3-large"
      },
      {
         "@odata.type":"#Microsoft.Skills.Custom.ChatCompletionSkill",
         "name":"genAI-prompt-skill",
         "description":"GenAI Prompt skill for image verbalization",
         "uri":"{{chatCompletionModelUri}}",
         "apiKey":"{{chatCompletionModelKey}}",
         "context":"/document/normalized_images/*",
         "inputs":[
            {
               "name":"systemMessage",
               "source":"='You are tasked with generating concise, accurate descriptions of images, figures, diagrams, or charts in documents. The goal is to capture the key information and meaning conveyed by the image without including extraneous details like style, colors, visual aesthetics, or size.\n\nInstructions:\nContent Focus: Describe the core content and relationships depicted in the image.\n\nFor diagrams, specify the main elements and how they are connected or interact.\nFor charts, highlight key data points, trends, comparisons, or conclusions.\nFor figures or technical illustrations, identify the components and their significance.\nClarity & Precision: Use concise language to ensure clarity and technical accuracy. Avoid subjective or interpretive statements.\n\nAvoid Visual Descriptors: Exclude details about:\n\nColors, shading, and visual styles.\nImage size, layout, or decorative elements.\nFonts, borders, and stylistic embellishments.\nContext: If relevant, relate the image to the broader content of the technical document or the topic it supports.\n\nExample Descriptions:\nDiagram: \"A flowchart showing the four stages of a machine learning pipeline: data collection, preprocessing, model training, and evaluation, with arrows indicating the sequential flow of tasks.\"\n\nChart: \"A bar chart comparing the performance of four algorithms on three datasets, showing that Algorithm A consistently outperforms the others on Dataset 1.\"\n\nFigure: \"A labeled diagram illustrating the components of a transformer model, including the encoder, decoder, self-attention mechanism, and feedforward layers.\"'"
            },
            {
               "name":"userMessage",
               "source":"='Please describe this image.'"
            },
            {
               "name":"image",
               "source":"/document/normalized_images/*/data"
            }
         ],
         "outputs":[
            {
               "name":"response",
               "targetName":"verbalizedImage"
            }
         ]
      },
      {
         "@odata.type":"#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
         "name":"verbalized-image-embedding-skill",
         "description":"Embedding skill for verbalized images",
         "context":"/document/normalized_images/*",
         "inputs":[
            {
               "name":"text",
               "source":"/document/normalized_images/*/verbalizedImage",
               "inputs":[
                  
               ]
            }
         ],
         "outputs":[
            {
               "name":"embedding",
               "targetName":"verbalizedImage_vector"
            }
         ],
         "resourceUri":"{{textEmbeddingModelUri}}",
         "deploymentId":"text-embedding-3-large",
         "apiKey":"{{textEmbeddingModelKey}}",
         "dimensions":3072,
         "modelName":"text-embedding-3-large"
      }
   ],
   "indexProjections":{
      "selectors":[
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"text_document_id",
            "sourceContext":"/document/text_sections/*",
            "mappings":[
               {
                  "name":"content_embedding",
                  "source":"/document/text_sections/*/text_vector"
               },
               {
                  "name":"content_text",
                  "source":"/document/text_sections/*/content"
               },
               {
                  "name":"locationMetadata",
                  "source":"/document/text_sections/*/locationMetadata"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               }
            ]
         },
         {
            "targetIndexName":"demo-multimodal-index",
            "parentKeyFieldName":"image_document_id",
            "sourceContext":"/document/normalized_images/*",
            "mappings":[
               {
                  "name":"content_text",
                  "source":"/document/normalized_images/*/verbalizedImage"
               },
               {
                  "name":"content_embedding",
                  "source":"/document/normalized_images/*/verbalizedImage_vector"
               },
               {
                  "name":"content_path",
                  "source":"/document/normalized_images/*/imagePath"
               },
               {
                  "name":"document_title",
                  "source":"/document/document_title"
               },
               {
                  "name":"locationMetadata",
                  "source":"/document/normalized_images/*/locationMetadata"
               }
            ]
         }
      ],
      "parameters":{
         "projectionMode":"skipIndexingParentDocuments"
      }
   },
   "cognitiveServices":{
      "@odata.type":"#Microsoft.Azure.Search.AIServicesByIdentity",
      "subdomainUrl":"{{foundryUrl}}",
      "identity":null
   },
   "knowledgeStore":{
      "storageConnectionString":"{{storageConnection}}",
      "projections":[
         {
            "files":[
               {
                  "storageContainer":"{{imageProjectionContainer}}",
                  "source":"/document/normalized_images/*"
               }
            ]
         }
      ]
   }
}

Jalankan pemberi indeks

Buat Pengindeks membuat pengindeks di layanan pencarian Anda. Pengindeks tersambung ke sumber data, memuat data, menjalankan set keterampilan, dan mengindeks konten yang diperkaya.

### Create and run an indexer
POST {{searchUrl}}/indexers?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}

{
  "name": "demo-multimodal-indexer",
  "dataSourceName": "demo-multimodal-ds",
  "targetIndexName": "demo-multimodal-index",
  "skillsetName": "demo-multimodal-skillset",
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": 0,
    "batchSize": 1,
    "configuration": {
      "allowSkillsetToReadFileData": true
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "document_title"
    }
  ],
  "outputFieldMappings": []
}

Jalankan kueri

Anda dapat mulai mencari segera setelah dokumen pertama dimuat. Ini adalah kueri pencarian teks lengkap yang tidak ditentukan yang mengembalikan semua bidang yang ditandai sebagai dapat diambil dalam indeks, bersama dengan jumlah dokumen.

Tip

Bidang content_embedding berisi lebih dari seribu dimensi. select Gunakan pernyataan untuk mengecualikan bidang tersebut dari respons dengan secara eksplisit memilih semua bidang lainnya. Sesuaikan pernyataan pilih agar sesuai dengan bidang (location_metadata vs locationMetadata) di indeks Anda. Berikut adalah contohnya: "select": "content_id, text_document_id, document_title, image_document_id, content_text,

### Query the index
POST {{searchUrl}}/indexes/demo-multimodal-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}
  
  {
    "search": "*",
    "count": true
  }

Kirim permintaan. Respons akan terlihat seperti ini:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
Preference-Applied: odata.include-annotations="*"
OData-Version: 4.0
request-id: 712ca003-9493-40f8-a15e-cf719734a805
elapsed-time: 198
Date: Wed, 30 Apr 2025 23:20:53 GMT
Connection: close

{
  "@odata.count": 100,
  "@search.nextPageParameters": {
    "search": "*",
    "count": true,
    "skip": 50
  },
  "value": [
  ],
  "@odata.nextLink": "https://<YOUR-SEARCH-SERVICE-NAME>.search.windows.net/indexes/demo-multimodal-index/docs/search?api-version=2025-11-01-preview "
}

100 dokumen dikembalikan dalam respons.

Kueri untuk konten khusus gambar

Gunakan filter untuk mengecualikan semua konten non-gambar. Parameter $filter hanya berfungsi pada bidang yang ditandai dapat difilter selama pembuatan indeks.

Untuk filter, Anda juga dapat menggunakan operator logis (dan, atau, tidak) dan operator perbandingan (eq, ne, gt, lt, ge, le). Perbandingan string tidak peka huruf besar/kecil. Untuk informasi dan contoh selengkapnya, lihat Contoh kueri pencarian sederhana.

POST {{searchUrl}}/indexes/demo-multimodal-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}
  
  {
    "search": "*",
    "count": true,
    "filter": "image_document_id ne null"
  }

Hasil pencarian yang berisi konten khusus gambar tidak memiliki konten teks, sehingga Anda dapat mengecualikan bidang teks.

Bidang content_embedding berisi vektor berdimensi tinggi (biasanya 1.000 hingga 3.000 dimensi) untuk teks halaman dan deskripsi gambar yang telah diverbalisasi. Kecualikan bidang ini dari kueri.

Bidang content_path berisi jalur relatif ke file gambar dalam kontainer proyeksi gambar yang ditunjuk. Bidang ini dihasilkan hanya untuk gambar yang diekstrak dari PDF ketika imageAction diatur ke generateNormalizedImages, dan dapat dipetakan dari dokumen yang diperkaya dari bidang sumber /document/normalized_images/*/imagePath.

Untuk konteks PDF yang diekstrak menggunakan keterampilan Pemisahan Teks, keterampilan Shaper menambahkan nama kontainer ke jalur dan metadata lokasi.

Kueri untuk teks atau gambar dengan konten yang terkait dengan energi, mengembalikan ID konten, dokumen induk, dan teks (hanya diisi untuk potongan teks), dan jalur konten tempat gambar disimpan di penyimpanan pengetahuan (hanya diisi untuk gambar).

Kueri ini hanya pencarian teks lengkap, tetapi Anda bisa mengkueri bidang vektor untuk pencarian kesamaan.

POST {{searchUrl}}/indexes/demo-multimodal-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}
  

  {
    "search": "energy",
    "count": true
  }

Atur ulang dan jalankan ulang

Pengindeks dapat diatur ulang untuk menghapus tanda air tinggi, yang memungkinkan pembangunan ulang penuh. Permintaan POST berikut adalah untuk reset, diikuti dengan jalankan ulang.

### Reset the indexer
POST {{searchUrl}}/indexers/demo-multimodal-indexer/reset?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}

### Run the indexer
POST {{searchUrl}}/indexers/demo-multimodal-indexer/run?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}

### Check indexer status 
GET {{searchUrl}}/indexers/demo-multimodal-indexer/status?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  Authorization: Bearer {{token}}

Melihat gambar di penyimpanan pengetahuan

Ingat bahwa kumpulan keterampilan dalam tutorial ini membuat penyimpanan pengetahuan untuk konten gambar yang diekstrak dari PDF. Setelah pengindeks berjalan, kontainer sustainable-ai-pdf-images harus memiliki sekitar 23 gambar.

Anda tidak dapat mengembalikan gambar ini dalam kueri pencarian. Namun, Anda dapat menulis kode aplikasi yang memanggil API Azure Storage untuk mengambil gambar jika Anda membutuhkannya untuk pengalaman pengguna. Bidang content_path memiliki jalur ke setiap gambar.

Untuk melihat gambar di Browser Penyimpanan:

Masuk ke portal Microsoft Azure dan navigasikan ke akun Storage Anda.
Di Penjelajah Penyimpanan, perluas kontainer sustainable-ai-pdf-images.
Pilih gambar.
Di menu paling kanan (...), pilih Tampilkan/Edit.

Membersihkan sumber daya

Saat Anda bekerja di langganan Anda sendiri, ada baiknya untuk menyelesaikan proyek dengan menghapus sumber daya yang tidak lagi Anda butuhkan. Sumber daya yang dibiarkan berjalan dapat dikenakan biaya.

Di portal Microsoft Azure, pilih Semua sumber daya atau Grup sumber daya dari panel kiri untuk menemukan dan mengelola sumber daya. Anda dapat menghapus sumber daya satu per satu atau menghapus grup sumber daya untuk menghapus semua sumber daya sekaligus.

Saran dan Komentar

Apakah halaman ini membantu?

Last updated on 2026-04-30

Tutorial: Mengekstrak, memotong, dan menyematkan konten multimodal

Prasyarat

Mengonfigurasi akses

Dapatkan titik akhir

Siapkan data

Pilih keterampilan untuk pengindeksan multimodal

Atur lingkungan Anda

Menyiapkan alur

Mengunduh file REST

Membuat sumber data

Buat indeks

Membuat kumpulan keterampilan untuk ekstraksi, pengelompokan, dan vektorisasi

Jalankan pemberi indeks

Jalankan kueri

Kueri untuk konten khusus gambar

Kueri untuk konten yang terkait dengan "energi"

Atur ulang dan jalankan ulang

Melihat gambar di penyimpanan pengetahuan

Membersihkan sumber daya

Saran dan Komentar

Sumber Daya Tambahan: