Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this tutorial, you build a retrieval-augmented generation (RAG) pipeline over documents stored in Azure Files. The pipeline uses Haystack for orchestration and Pinecone as the vector database. Haystack models the pipeline as an explicit directed acyclic graph (DAG) of typed components — embedder, retriever, prompt builder, generator — so you can inspect and extend each stage independently.
Prerequisites
- Complete the project setup and Azure Files authentication.
- An Azure OpenAI resource with the following deployments:
- A text embedding model (for example,
text-embedding-3-small) - A chat completion model (for example,
gpt-4o)
- A text embedding model (for example,
- A Pinecone account (the free tier is sufficient). You need an API key and an index name from the Pinecone console.
Important
Store your Pinecone API key securely. Don't commit API keys to source control.
Set the following environment variables in your .env file:
PINECONE_API_KEY=<your-pinecone-api-key>
PINECONE_INDEX_NAME=<your-pinecone-index-name>
| Variable | Description |
|---|---|
PINECONE_API_KEY |
Your Pinecone API key from the Pinecone console |
PINECONE_INDEX_NAME |
The name of your Pinecone index |
Install dependencies
Install the required packages for this tutorial:
pip install haystack-ai pinecone-haystack pypdf python-docx pinecone
Step 1: Parse and chunk documents
After downloading files from Azure Files (covered in the setup article), split the documents into overlapping chunks. Haystack's DocumentSplitter splits by word count rather than character count, which produces more semantically consistent chunk sizes.
from haystack.components.preprocessors import DocumentSplitter
def chunk_documents(documents):
splitter = DocumentSplitter(
split_by="word",
split_length=CHUNK_SIZE,
split_overlap=CHUNK_OVERLAP,
)
result = splitter.run(documents=documents)
return result["documents"]
DocumentSplitter splits each document into chunks of CHUNK_SIZE words with CHUNK_OVERLAP words of overlap between adjacent chunks. All original metadata (such as file path) is automatically copied to each child chunk.
Step 2: Create embeddings and index into Pinecone
Build a Haystack indexing pipeline that embeds document chunks with Azure OpenAI and writes the vectors into a Pinecone index.
from haystack import Pipeline
from haystack.components.embedders import AzureOpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
def embed_and_index(chunks):
store = PineconeDocumentStore(
index=PINECONE_INDEX_NAME,
namespace="default",
dimension=EMBEDDING_DIMENSIONS,
metric="cosine",
spec={"serverless": {"region": "eastus2", "cloud": "azure"}},
)
embedder = AzureOpenAIDocumentEmbedder(
azure_endpoint=OPENAI_ENDPOINT,
azure_deployment=OPENAI_EMBEDDING_DEPLOYMENT,
azure_ad_token_provider=TOKEN_PROVIDER,
api_key=None,
dimensions=EMBEDDING_DIMENSIONS,
)
writer = DocumentWriter(
document_store=store,
policy=DuplicatePolicy.OVERWRITE,
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", embedder)
indexing_pipeline.add_component("writer", writer)
indexing_pipeline.connect("embedder.documents", "writer.documents")
indexing_pipeline.run({"embedder": {"documents": chunks}})
return store
This function:
- Creates the document store —
PineconeDocumentStorecreates the Pinecone index automatically if it doesn't exist, using thedimension,metric, andspecparameters. If the index already exists, it reuses it. - Creates the embedding model —
AzureOpenAIDocumentEmbedderauthenticates to Azure OpenAI using Entra ID tokens (viaazure_ad_token_provider), not API keys. The explicitapi_key=Noneprevents Haystack from reading a default key from the environment. - Writes to Pinecone —
DocumentWriterupserts the embedded documents into the store.DuplicatePolicy.OVERWRITEreplaces existing documents with the same ID, making the pipeline idempotent across repeated runs. - Connects the DAG —
connect("embedder.documents", "writer.documents")wires the embedder's output to the writer's input. Each component declares typed sockets;connect()binds them, andrun()pushes data through the graph.
Step 3: Build the retrieval pipeline
Build a Haystack query pipeline that embeds the user's question, retrieves matching chunks from Pinecone, and generates an answer using Azure OpenAI.
from haystack.components.builders import PromptBuilder
from haystack.components.embedders import AzureOpenAITextEmbedder
from haystack.components.generators import AzureOpenAIGenerator
from haystack_integrations.components.retrievers.pinecone import PineconeEmbeddingRetriever
def build_query_pipeline(document_store):
text_embedder = AzureOpenAITextEmbedder(
azure_endpoint=OPENAI_ENDPOINT,
azure_deployment=OPENAI_EMBEDDING_DEPLOYMENT,
azure_ad_token_provider=TOKEN_PROVIDER,
api_key=None,
dimensions=EMBEDDING_DIMENSIONS,
)
retriever = PineconeEmbeddingRetriever(
document_store=document_store,
top_k=5,
)
prompt_builder = PromptBuilder(template=_PROMPT_TEMPLATE)
generator = AzureOpenAIGenerator(
azure_endpoint=OPENAI_ENDPOINT,
azure_deployment=OPENAI_CHAT_DEPLOYMENT,
azure_ad_token_provider=TOKEN_PROVIDER,
api_key=None,
)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", retriever)
query_pipeline.add_component("prompt_builder", prompt_builder)
query_pipeline.add_component("generator", generator)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder.prompt", "generator.prompt")
return query_pipeline
The query pipeline has four stages:
- Embed —
AzureOpenAITextEmbedderconverts the user's question into an embedding vector. This is a different class from the document embedder used during indexing — Haystack uses separate components because they accept different input types (a single string versus a list of documents). - Retrieve —
PineconeEmbeddingRetrieverqueries Pinecone with the embedding vector and returns the top k matching chunks using cosine similarity. - Prompt —
PromptBuilderuses a Jinja2 template that iterates over the retrieved documents, injects the user's question, and instructs the LLM to answer based only on the provided context. - Generate —
AzureOpenAIGeneratorsends the rendered prompt to Azure OpenAI and returns the response.
Step 4: Run the pipeline
Run the pipeline script:
python haystack-pinecone.py
The script scans the Azure file share, downloads and parses documents, chunks them, indexes them into Pinecone, and starts an interactive query session. Type a question to query your documents. Type quit to exit.
Clean up resources
To delete the Azure resources created for this tutorial:
az group delete --name rg-rag-demo --yes --no-wait
Note
Your Azure file share may be shared infrastructure — confirm with your administrator before deleting. To remove your Pinecone index, delete it from the Pinecone console.