How to change the field structure of an index created by data import and vectorization using Azure AI Search to look like the original dataset.

PA 松村優 25 Reputation points
2024-02-19T11:07:31.1866667+00:00

I am looking to use Azure AI Search to import and vectorize data. The dataset is specified as csv and I want to create fields in Azure AI Search as well as columns in the csv. Currently I have performed the data import and vectorization and an index has been created. However, the field structure is not like the csv, but rather [chunk_id, parent_id, chunk, title, vector]. Originally, we would like to create fields with the following structure. [product, department, category, question, answer]. Is it possible to achieve what I want to do? Also, what is the best way to achieve what I want to do? If anyone knows anything about my question, could you please let me know? Thanks.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
991 questions
{count} vote

2 answers

Sort by: Most helpful
  1. SnehaAgrawal-MSFT 21,506 Reputation points
    2024-02-20T16:45:15.4033333+00:00

    @PA 松村優
    Importing and vectorizing data via the portal UX currently does not accommodate delimited text scenarios such as CSV files. Users will have to construct their indexer and skillset manually, referring to the documentation- Search over CSV blobs - Azure AI Search | Microsoft Learn

    You should follow the instructions in the documentation to update your indexer and adjust the skillset definition. Specifically, you need to modify the input for the SplitSkill to "/document/columnName," where "columnName" represents the desired column to be chunked and vectorized.

    Additionally, you must update your index projections to include any additional fields you wish to incorporate into each document.

    Let us know.

    0 comments No comments

  2. 2024-09-17T18:20:35.1033333+00:00

    I'm trying to do the same thing. The UI's "Import and Vectorize" doesn't really work even though you can specify a "CSV" delimitedText parsing mode.

    Has anyone got this to work properly? if so, can you post a sample json of the indexer and skillset for parsing a CSV document with this sample layout:

    EmployeeID, FirstName, LastName, Biography

    1, Ron, Hubert, "a long biography that can't fit on one page."

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.