Edit

Semantic reranking with the rank() function for Azure HorizonDB (Preview)

Vector search retrieves results that are semantically close to a query, but "close" in embedding space doesn't always mean "relevant" to the user's intent. Synonyms, intent shifts, long-tail phrasing, and domain-specific nuance can cause the most relevant document to rank third or tenth instead of first. Users judge results by how well they match their intent, not by raw distance scores.

Semantic reranking addresses this. After an initial retrieval stage (vector search, full-text search, or hybrid search) returns a broad set of candidates, a cross-encoder reranker model rescores each candidate against the original query. Unlike embedding models that encode the query and document separately, a cross-encoder processes the query and document together, capturing fine-grained interactions between them and producing more accurate relevance scores.

The azure_ai.rank() function in the azure_ai extension brings this capability directly into SQL, so you can add a reranking stage without leaving the database.

Why reranking matters

The gap between distance and relevance

Vector embeddings are optimized for fast, approximate similarity at scale. They encode meaning into fixed-size vectors independently, one for the query, one for each document. This bi-encoder approach is efficient (you can index billions of vectors with DiskANN), but it compresses away fine-grained interactions between query and document.

A cross-encoder, by contrast, processes the query and document as a single input pair. It attends to every word in both, capturing token-level interactions that bi-encoders miss. The result is more accurate relevance scores, but at higher computational cost, because every candidate requires a separate inference call.

Why not use a cross-encoder for everything?

Cross-encoders are accurate but expensive. Scoring 1 million documents with a cross-encoder at query time is impractical as it would take seconds to minutes per query. That's why production retrieval uses a two-stage pipeline:

  1. Stage 1: Retrieve a broad set of candidates cheaply (vector search, BM25, or hybrid search). This narrows millions of documents down to the top 50-100.
  2. Stage 2: Rerank only those 50-100 candidates with a cross-encoder for precision on the results that matter most.

This pattern gives you the speed of embedding-based retrieval with the accuracy of cross-encoder scoring, at a fraction of the cost of running the cross-encoder over the entire corpus.

When to use semantic reranking

Use reranking when Skip reranking when
Search quality directly affects user experience (product search, support search, knowledge base) Simple exact-match lookups (product code, ID search)
Queries are natural language with nuance, synonyms, or intent variation The corpus is small and homogeneous enough that vector search alone achieves high precision
You're building RAG or Implement durable AI pipelines in Azure HorizonDB (Preview) and need the best possible context for large language models (LLM) generation Latency budget can't accommodate the additional model call
Hybrid search returns a fused list and you want a final accuracy bump Results are already filtered to a small set (fewer than 5)

Prerequisites

An Azure HorizonDB instance with either:

The rank() function

The azure_ai.rank() function reranks a set of documents based on their relevance to a query using a cross-encoder model.

Syntax

azure_ai.rank(
    query text,
    document_contents text[],
    document_ids text[] DEFAULT NULL,
    model text DEFAULT NULL
)

Arguments

Argument Type Description
query text The search query to evaluate relevance against.
document_contents text[] Array of document texts to rerank.
document_ids (optional) text[] Array of identifiers corresponding to each document.
model (optional) text Model alias registered in the model registry. When omitted, uses the default-reranker Managed Model (Cohere-rerank-v4.0-fast).

Return type

Returns a table with columns: document_id, rank, and relevance_score.

Note

BYOM users: Pass your registered reranker model alias as the model argument. For example: azure_ai.rank('query', documents, ids, 'my-reranker'). See AI functions in the azure_ai extension for Azure HorizonDB (Preview) for details on registering models.

Basic reranking example

Rerank a set of product descriptions against a search query:

SELECT * FROM azure_ai.rank(
    'wireless noise cancelling headphones',
    ARRAY[
        'Over-ear wireless headphones with active noise cancellation and 30-hour battery life.',
        'Wired earbuds with inline microphone, no noise cancellation.',
        'Bluetooth speaker with 360-degree sound and waterproof design.',
        'Compact noise cancelling earbuds with transparency mode and wireless charging case.'
    ]
);

Note

BYOM users: Add your model alias as the last argument: azure_ai.rank('wireless noise cancelling headphones', ARRAY[...], NULL, 'my-reranker').

Two-stage retrieval: vector search + reranking

The most common pattern is to retrieve candidates with vector search, then rerank the top results for precision:

WITH candidates AS (
    SELECT id, title, description
    FROM products
    ORDER BY embedding <=> azure_openai.create_embeddings(
        input => 'wireless noise cancelling headphones'
    )::vector
    LIMIT 20
),
reranked AS (
    SELECT *
    FROM azure_ai.rank(
        'wireless noise cancelling headphones',
        ARRAY(SELECT description FROM candidates),
        ARRAY(SELECT id::text FROM candidates)
    )
)
SELECT c.id, c.title, r.rank, r.relevance_score
FROM candidates c
JOIN reranked r ON r.document_id = c.id::text
ORDER BY r.rank ASC
LIMIT 10;

Note

BYOM users: Replace azure_openai.create_embeddings(input => 'wireless noise cancelling headphones') with azure_openai.create_embeddings('my-embedding', 'wireless noise cancelling headphones') and add 'my-reranker' as the last argument to azure_ai.rank().

Hybrid search + reranking

For the best retrieval quality, combine BM25 full-text search and vector search with Reciprocal Rank Fusion, then rerank the fused results. If you want to run this pattern as a durable, fault-tolerant workflow with automatic retries and checkpointing, see Implement durable AI pipelines in Azure HorizonDB (Preview) - which supports ai.rank() as a built-in pipeline step.

WITH query AS (
    SELECT
        'wireless noise cancelling headphones' AS q_text,
        azure_openai.create_embeddings(
            input => 'wireless noise cancelling headphones'
        )::vector AS q_vec
),
bm25 AS (
    SELECT id, ROW_NUMBER() OVER () AS bm25_rank
    FROM products, query
    WHERE pgfts.fts_query(query.q_text, 'idx_products_fts')
    LIMIT 50
),
vec AS (
    SELECT p.id, ROW_NUMBER() OVER (ORDER BY p.embedding <=> query.q_vec) AS vec_rank
    FROM products p, query
    ORDER BY p.embedding <=> query.q_vec
    LIMIT 50
),
fused AS (
    SELECT p.id, p.description,
           (1.0 / (60 + COALESCE(b.bm25_rank, 1000))) +
           (1.0 / (60 + COALESCE(v.vec_rank, 1000))) AS rrf_score
    FROM products p
    LEFT JOIN bm25 b ON b.id = p.id
    LEFT JOIN vec v ON v.id = p.id
    WHERE b.id IS NOT NULL OR v.id IS NOT NULL
    ORDER BY rrf_score DESC
    LIMIT 20
),
reranked AS (
    SELECT *
    FROM azure_ai.rank(
        'wireless noise cancelling headphones',
        ARRAY(SELECT description FROM fused),
        ARRAY(SELECT id::text FROM fused)
    )
)
SELECT f.id, f.description, r.rank, r.relevance_score
FROM fused f
JOIN reranked r ON r.document_id = f.id::text
ORDER BY r.rank ASC
LIMIT 10;

Note

BYOM users: Pass your model aliases to create_embeddings() and rank() as shown in previous examples.

Performance considerations

Factor Recommendation
Candidate pool size Rerank 20-50 candidates. More candidates improve recall but increase latency and cost linearly.
Latency Cross-encoder scoring adds tens to low hundreds of milliseconds depending on the pool size and document length.
Document length Shorter documents rerank faster. If documents are long, consider reranking over summaries or the most relevant chunk rather than the full text.
When to skip If your retrieval stage already returns fewer than five results, reranking adds cost with diminishing accuracy gains.