Module Assessment

1.

Which pgvector distance operator should you use when your embeddings are normalized to unit length and you want to measure semantic similarity?

<=> (cosine distance)

<-> (L2 distance)

<#> (negative inner product)

2.

You're building a RAG pipeline that needs to retrieve relevant document chunks quickly from a collection of 5 million embeddings. The collection receives occasional batch updates but no real-time inserts. Which index type should you choose?

IVFFlat with an appropriate number of lists

HNSW with high ef_construction value

No index, relying on exact sequential scan

3.

When creating an HNSW index, what does the m parameter control?

The maximum number of connections per node in the graph

The number of candidate neighbors considered during index construction

The number of lists to partition vectors into

4.

You need to update embeddings for 50,000 product descriptions after switching to a new embedding model. What approach minimizes the impact on concurrent searches?

Batch the updates into transactions of 1,000-5,000 rows each

Update all 50,000 rows in a single transaction

Drop the existing vector index before updating

5.

In a hybrid search combining vector similarity with full-text search, what technique helps balance the relevance scores from both search methods?

Using Reciprocal Rank Fusion (RRF) to combine rankings

Multiplying the vector distance by the text relevance score

Always returning vector search results first

Feedback