How do I delete all records where my date_ymd column in Azure Search Index is equal to a specific date?

Aravind Vijay 20 Reputation points
2025-04-01T10:39:39.27+00:00

Hi, I have an issue where I'm collecting a lot of data on a scheduled script that will store the data in a the Azure Search Index as a vector db and I then use RAG to collect data from this based on a prompt sent by a user in a chatbot and the AI's response is based on the top N documents from the storage.

My first question is if there's a way to workaround just using the top N documents from Azure Search Index. I get the top 50 documents for a users prompt and feed it to my chatbot's system prompt. Is there anyway to link my streamlit chatbot to the Azure Search Index directly without feeding only N documents.

Secondly and more importantly. I need the code to delete all documents for specific date_ymd values. Keep in mind all my columns and keys are string type and not date type. So Can you help with creating the script to delete values which have a certain string date.

This is my code for uploading documents:

def chunk_data(data, chunk_size):
    for i in range(0, len(data), chunk_size):
        yield data[i:i + chunk_size]


def upload_documents_to_search_client(df, embeddings_dict, chunk_size=32000):
    """Uploads documents with embeddings to the search client in chunks."""
    data = [
        {
            "@search.action": "mergeOrUpload",
            "hardware_id": str(row["hardware_id"]),
            "text_feedback": str(row["text_feedback"]) if "text_feedback" in row else "",
            "uninstall_text_feedback": str(row["uninstall_text_feedback"]) if "uninstall_text_feedback" in row else "",
            "os": str(row["os"]) if "os" in row else "",
            "date_ymd": str(row["date_ymd"]) if "date_ymd" in row else "",
            "Feature_Category": str(row["Feature_Category"]) if "Feature_Category" in row else "",
            "Sentiment": str(row["Sentiment"]) if "Sentiment" in row else "",
            "country": str(row["country"]) if "country" in row else "",
            "aiid": str(map_aiid_to_label(row["aiid"])) if "aiid" in row else "",
            "version_app": str(row["version_app"]) if "version_app" in row else "",
            "os_version": str(row["version"]) if "version" in row else "",
            "architecture": str(row["architecture"]) if "architecture" in row else "",
            "score": str(row["score"]) if "score" in row else "",
            "region": str(row["region"]) if "region" in row else "",
            "city": str(row["city"]) if "city" in row else "",
            "vector_text_feedback": next(
                (item["embeddings"].get("vector_text_feedback", []) for item in embeddings_dict if item["hardware_id"] == str(row["hardware_id"])),
                []
            ),
            "vector_uninstall_feedback": next(
                (item["embeddings"].get("vector_uninstall_feedback", []) for item in embeddings_dict if item["hardware_id"] == str(row["hardware_id"])),
                []
            )
        }
        for _, row in df.iterrows()
    ]
    for chunk in chunk_data(data, chunk_size):
        try:
            result = search_client.upload_documents(documents=chunk)
            print(f"Uploaded {len(chunk)} documents successfully.")
        except Exception as e:
            print(f"An error occurred during document upload: {e}")
            return None
Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
{count} votes

Accepted answer
  1. Bhargavi Naragani 5,270 Reputation points Microsoft External Staff Moderator
    2025-04-03T07:38:39.5733333+00:00

    Hi @Aravind Vijay,

    Currently, Azure Cognitive Search retrieves a specified number of top documents based on relevance. Directly integrating your Streamlit chatbot with the Azure Search Index to access more than the top N documents isn't natively supported. However, you can implement pagination to retrieve additional documents beyond the initial set. This involves making successive search queries with appropriate skip and top parameters to navigate through the result set. By aggregating these results, you can provide your chatbot with a broader context. https://learn.microsoft.com/en-us/azure/search/search-pagination-page-layout

    To remove documents where the date_ymd field matches a specific date, you'll need to perform a two-step process. Since Azure Cognitive Search requires the document's key field (e.g., hardware_id) for deletion, you must first query the index to obtain the keys of documents matching your date_ymd criteria.​ Once you have the list of keys, you can issue delete operations for those specific documents.​

    Here's how you can implement this in Python using the Azure Search SDK:

    from azure.search.documents import SearchClient
    from azure.core.credentials import AzureKeyCredential
    # Initialize the SearchClient
    service_endpoint = "https://<your-service-name>.search.windows.net"
    index_name = "<your-index-name>"
    api_key = "<your-api-key>"
    search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(api_key))
    def delete_documents_by_date(date_ymd):
        # Step 1: Retrieve documents with the specified date_ymd
        filter_expression = f"date_ymd eq '{date_ymd}'"
        results = search_client.search(search_text="", filter=filter_expression, select=["hardware_id"])
        # Step 2: Collect the keys of the documents to be deleted
        documents_to_delete = [{"hardware_id": doc["hardware_id"]} for doc in results]
        # Step 3: Delete the documents in batches
        if documents_to_delete:
            batch_size = 1000  # Adjust batch size as needed
            for i in range(0, len(documents_to_delete), batch_size):
                batch = documents_to_delete[i:i + batch_size]
                for doc in batch:
                    doc["@search.action"] = "delete"
                search_client.upload_documents(documents=batch)
                print(f"Deleted batch of {len(batch)} documents.")
        else:
            print("No documents found with the specified date.")
    

    Since your date_ymd field is of string type, make sure that the format of the date in your query is identical to the format in your index (i.e., 'YYYY-MM-DD').​ Azure Search imposes batch size limits. It's recommended to execute deletions in batches (e.g., 1,000 documents per batch) so as not to exceed these limits.​ Deletions are executed asynchronously. There may be a slight delay before the updates are applied in the index.

    Refer to the Azure AI Search documentation on adding, updating, or deleting documents for better understanding.

    Hope the above provided information help you resolve the issue, if you have any further concerns or queries, please feel free to reach out to us.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.