CRUD on Azure Vector Index/database

Mads Helles 20 Reputation points
2024-05-21T12:19:27.1533333+00:00

Hi

I'm currently working on a PromptFlow in Azure. The flow has two vector indexes attached. All data is either in the form of .txt og .pdf and are stored i blobs in two seperate folders in Azure Blob Store.

Things are going great and my flow is inching closer and closer to production state. Some of the documents need updating, some need to be split and other need to be deleted.

There does not seem to be any information on the web, that describes how to do this.

Is there a way to refresh/rebuild the index, so the changes are reflected in the search results?

Best Regards

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
798 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,542 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,437 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 5,471 Reputation points
    2024-05-21T15:01:52.78+00:00

    Hello Mads Helles,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Regarding your questions, I understand that you are facing three major challenges as listed below and also need some documentation or references. You would like to:

    • Update Documents by adding, split or update documents
    • Delete Documents.
    • Refresh or Rebuild the Vector Index.

    To answer your questions, I strongly believe you will be using Python and Azure AI Search, Azure Cognitive Search, custom vector index in a database, etc. the process may vary slightly but you can achieve the goal by following the below steps.

    However, to update documents within your vector indexes:

    • Get the modified .txt or .pdf files from your Azure Blob Storage.
    • Use the relevant Azure SDK (like the Python SDK) or REST APIs to update the documents in your vector index.
    • Call the necessary APIs or SDK methods to re-index the updated documents.

    Secondly, to Splitting and Deleting Documents:

    • If you need to split a large document into smaller ones, extract the relevant sections and create new, smaller documents.
    • To delete documents, identify which ones you want to remove and use the appropriate APIs or SDK methods to delete them from the index.

    Thirdly, to refresh or rebuild the index:

    • Run your indexer on demand with a "reset" option. Resetting the indexer clears the high-water mark and performs a full reindex of all documents.
    • Alternatively, schedule regular updates (hourly or daily) to keep your index current.
    • Keep in mind that the indexer will stop executing when there are no more documents to load or refresh.

    If you are using Azure Cognitive Search, you can trigger a reindexing process. This can be done by updating the indexer configuration to force a full reindex like this:

    az search indexer reset --name <indexer-name> --index-name <index-name> --data-source-name <data-source-name>
    

    Alternatively, you can use the Azure Portal:

    Navigate to your Azure Cognitive Search service.

    Go to the "Indexers" section.

    Select your indexer and click "Run".

    References

    To read more and unlock step by steps documentation as requested:

    Run or reset indexers - Azure AI Search

    How to create vector indexes - Azure AI Studio

    Incrementally Indexing documents with Azure AI Search

    OpenAI API - Trying to create vectors and chunked data using Azure

    Create or Update Index (Preview) - Azure AI Search.

    Vector search - Azure AI Search.

    Also:

    Check the lists of documents and training available by the right side of this page: Additional resources.

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Mads Helles 20 Reputation points
    2024-05-22T12:33:49.38+00:00

    So I've been testen and trying out different ways to do this and found that the snippet from medieum.com article worked. I had to try a few times before I got it right, but I can now add/remove/change documents from my blobstore and rerun the indexer.

    The link to the Medium article has changed to this.

    https://medium.com/microsoftazure/incrementally-indexing-documents-with-azureai-search-integrated-vectorization-6f7150556f62

    Here is the part of the article that helped me (I added my own screenshot for copyright reasons):

    1. Azure AI Search — integrated vectorisation (IV). The newest tool in the box is the integrated vectorisation support on Azure AI Search. Triggered by the “import and vectorize data” tool, IV creates a data source, an index, skillsets for data chunking and vectorization and finally an indexer.

    User's image

    The indexer can then be periodically rerun, to incrementally update the index or called out via a REST API call e.g. upon a new document being uploaded to a blob container which triggers an Azure Function calling the REST API…

    Once created indexer can recognize the delta in the defined data source (e.g. new documents in the predefined blob container folder) and only index the delta and update the index with it.

    Azure AI search skillset that chunks documents and generates embeddings. You can modify the chunking parameters such as the chunk size and chunk overlap.

    0 comments No comments