Does Azure AI Search Integrated Vectorization use a Vector Database in it's process?

Will 5 Reputation points
2024-02-05T03:09:54.77+00:00

I Utilized Azure's integrated vectorization tool to vectorize my PDFS. I noticed it stored them in Storage accounts as JSON objects that you can view to see each chunk that is vectorized and it's associated vector. My question is, is their a vector database that stores each vector, something that's a UI that I can view the data instead of manually going in each folder to look at each json object? Does this Integrated vectorization method not use a Vector database like Pinecone, and if not is their any benefit to storing it in one?

Sorry if I missed anything, I am relatively new to this. But I had an additional question, I am running into a timeout error when running this service on PDF files that are 50+ pages long. I've been told that I would need to use Langchain or Semantic kernal to break down the files into smaller files then chunks it, is this because in the chunking phase in the integrated vectorization there's a limit to the amount of chunks it can create from 1 PDF? I read that there's a limit to the chunk size, but if i'm not hitting that maximum for each chunk why would it time out?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
599 questions
{count} vote

1 answer

Sort by: Most helpful
  1. ajkuma 20,556 Reputation points Microsoft Employee
    2024-02-07T05:23:09.6466667+00:00

    To better assist you on this, just to clarify, have you enabled the knowledge store and is looking at the intermediate state that is produced? Or are you using Azure Open AI on your data that stores intermediate JSON files when chunking the data in a storage account and not using our integrated vectorization?

    Azure AI search is a vector database. Based on my understanding of your question, if you do not want to use the API or Search Explorer you may use a UI like the following samples:

    For integrated vectorization -  the actual vectors and content are stored in a search index for quick searching, it uses a storage account as a data source, not to search over. The JSON files you might be looking at is external state.

    Kindly checkout this doc for more info:

    0 comments No comments