Hello @Amulya Prasad
Thanks for reaching out to us, it sounds like you're encountering a data isolation issue where queries for a single customer are somehow retrieving data from multiple customers instead of just the intended customer. This can be particularly challenging when integrating Blob storage, AI search services, and GPT-4 for customer-specific summaries.
I will suggest you checking on every part separately to make sure which part caused this issue, please perform tests to validate query isolation:
Manual Testing: Use the AI search service's query interface or tools to manually test queries for different customer IDs. Verify that the results returned correspond only to the documents associated with the queried customer.
- Automated Testing: Implement automated tests in your application or development environment that specifically target query isolation. These tests should simulate various scenarios where queries are made for different customer IDs and validate the correctness of the returned data.
If you still can not find the root cause, please verify data segregation, ensure that data segregation is properly implemented across your Blob storage, AI search indexes, and any other data sources you're using:
Unique IDs: Double-check that each document or record in your Blob storage and AI search index has a unique identifier associated with the specific customer it belongs to. This identifier should ideally be a customer ID or a unique key that unequivocally identifies the customer.
Query Implementation: Review how you're querying the data. When querying for a specific customer's summary, ensure that your query filters explicitly by the customer's unique ID. This should prevent any cross-customer data retrieval.
Check indexing and search configuration, please review the configuration of your AI search service:
- Index Definition: Verify that the index used by your AI search service is correctly configured to include and respect the customer ID or unique key as a filter or facet. This ensures that searches are scoped to the specific customer's data.
- Search Queries: Inspect the queries you're sending to the AI search service. Ensure that they include the customer's unique ID as a filter criterion and that there are no unintended wildcard or broad queries that could retrieve data from multiple customers.
Examine the process of ingesting data into Blob storage and indexing it into your AI search service:
- Data Sources: Ensure that data ingestion processes correctly tag or annotate each document with the appropriate customer ID or key during ingestion.
- Data Pipelines: Review the entire data pipeline from ingestion to indexing to querying. Look for any steps where customer IDs might be misinterpreted or overlooked.
Lastly, review how GPT-4 integrates with the AI search service and Blob storage:
- Input to GPT-4: Confirm that the data fed into GPT-4 for generating summaries is based on the filtered and isolated results retrieved from the AI search service. Ensure that there are no mixing or conflating of data from different customers at this stage.
If you go through all the steps but still have no idea about the root cause, please let us know.
I hope this helps.
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.