Optimize Redis data structures for vector storage

10 minutes

Choosing between Redis Hash and JSON data structures affects memory usage, performance, and how you work with your vector data. Understanding the tradeoffs helps you optimize for your application's needs.

Imagine you're building a product catalog with vector embeddings for similarity search. Each product has a name, price, category, and a 1536-dimensional embedding. If your data is simple and flat, Redis Hash stores it compactly with minimal overhead—just the fields you need. But if products have nested attributes (like multiple images, variant SKUs, or hierarchical categories), JSON lets you store complex structures naturally while still supporting vector search. Choosing the right structure ensures your system is both performant and maintainable.

Using Redis Hash for vectors

Redis Hash is a data structure that stores information as field-value pairs within a single key, similar to a dictionary or map in programming languages. When you use Hash for vector storage, your embedding becomes one field (stored as binary bytes) alongside other fields like product name, price, or category—all under one Redis key like product:12345. This flat structure means Hash excels at performance and memory efficiency because it stores only the exact fields you define, with minimal formatting overhead. For vector search applications with straightforward metadata, Hash provides the fastest queries and smallest memory footprint. Use Hash when you have flat data models with no nested objects, need maximum query performance, or prioritize memory efficiency.

Storing vectors in Hash

Store your vector as bytes in a hash field alongside other metadata:

# Code fragment - focus on Hash storage
import numpy as np

embedding = np.array([0.1, 0.2, 0.3, ...], dtype=np.float32)

redis_client.hset(
    "product:12345",
    mapping={
        "name": "Wireless Mouse",
        "price": "29.99",
        "category": "electronics",
        "embedding": embedding.tobytes()  # Store vector as bytes
    }
)

The vector is stored as a binary blob in the embedding field. This approach minimizes memory overhead—you only store what you need.

Creating an index for Hash-based vectors

Define your index to search Hash keys with a vector field:

# Code fragment - focus on Hash index
from redis.commands.search.field import TextField, NumericField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType

schema = (
    TextField("name"),
    NumericField("price"),
    TextField("category"),
    VectorField("embedding", "HNSW", {
        "TYPE": "FLOAT32",
        "DIM": 1536,
        "DISTANCE_METRIC": "COSINE"
    })
)

redis_client.ft("idx:products").create_index(
    fields=schema,
    definition=IndexDefinition(
        prefix=["product:"],
        index_type=IndexType.HASH
    )
)

Using Redis JSON for vectors

Redis JSON stores data as JSON documents, allowing you to organize information hierarchically with nested objects and arrays. Unlike Hash's flat field-value pairs, JSON lets you structure complex data naturally—for example, a product can have an array of images, nested specifications, or variant details, all in one document. When you use JSON for vector storage, your embedding is stored as a numeric array [0.1, 0.2, 0.3, ...] alongside other fields that can be as complex as your application needs. This flexibility comes with slightly higher memory overhead and query time compared to Hash.

Storing vectors in JSON

When storing vectors in JSON, you convert your NumPy array to a Python list using tolist(), which creates a JSON-compatible numeric array. This array format allows Redis to parse and index the vector while keeping your document flexible enough to include nested objects and complex metadata structures:

# Code fragment - focus on JSON storage
import redis.commands.json as json_commands

document = {
    "name": "Wireless Mouse",
    "price": 29.99,
    "category": "electronics",
    "embedding": embedding.tolist(),  # Convert to list for JSON
    "specs": {
        "color": "black",
        "wireless": True,
        "battery": "AA"
    }
}

redis_client.json().set("product:12345", "$", document)

The vector is stored as a JSON array [0.1, 0.2, 0.3, ...]. This format supports nested objects like specs naturally.

Creating an index for JSON-based vectors

Define your index to search JSON documents with a vector field. Use JSON when your data has nested structures, you need multiple vectors per document, your application already uses JSON format, or you need JSON query capabilities:

# Code fragment - focus on JSON index
from redis.commands.search.field import TextField, NumericField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType

schema = (
    TextField("$.name", as_name="name"),
    NumericField("$.price", as_name="price"),
    TextField("$.category", as_name="category"),
    VectorField("$.embedding", "HNSW", {
        "TYPE": "FLOAT32",
        "DIM": 1536,
        "DISTANCE_METRIC": "COSINE"
    }, as_name="embedding")
)

redis_client.ft("idx:products").create_index(
    fields=schema,
    definition=IndexDefinition(
        prefix=["product:"],
        index_type=IndexType.JSON
    )
)

Note the $. JSONPath syntax for field names and the as_name parameter for query references.

Comparing Hash and JSON

Choose based on your data complexity and performance requirements:

Factor	Hash	JSON
Memory usage	Lower (binary vector storage)	Higher (JSON array format)
Query performance	Faster	Slightly slower
Data complexity	Flat fields only	Nested objects supported
Vector storage	Binary bytes	Numeric array
Best for	Simple records, max performance	Complex documents, flexibility

Performance considerations

Hash storage is faster and uses less memory because vectors are stored as compact binary blobs. JSON storage adds overhead for the array structure but offers more flexibility:

# Code fragment - focus on storage comparison

# Hash: 1536 floats × 4 bytes = 6,144 bytes for vector
hash_vector = embedding.tobytes()  # 6,144 bytes

# JSON: Similar data size but with JSON formatting overhead
json_vector = embedding.tolist()  # 6,144 bytes + JSON structure

For 1 million products with 1536-dimensional vectors, Hash saves significant memory and improves query latency.

Switching between Hash and JSON

You can change your data structure by re-ingesting data with a new index:

# Code fragment - focus on migration approach

# Read from Hash
hash_data = redis_client.hgetall("product:12345")

# Convert to JSON
document = {
    "name": hash_data["name"],
    "price": float(hash_data["price"]),
    "category": hash_data["category"],
    "embedding": np.frombuffer(hash_data["embedding"], dtype=np.float32).tolist()
}

# Store as JSON
redis_client.json().set("product:12345", "$", document)

Delete the old index and create a new one with the appropriate IndexType for your new structure.

Quick decision guide

Use this guide to choose your data structure:

Choose Hash when:

Each item has simple, flat fields
You prioritize memory efficiency
You need maximum query speed
Your data model won't need nested objects

Choose JSON when:

Your data has nested structures
You need to store multiple vectors per item
Your application already uses JSON
Flexibility is more important than raw performance

For most vector search applications with simple metadata, Hash provides the best balance of performance and simplicity. Use JSON when your data complexity requires it.

Additional resources

Redis Hash Commands - Working with Redis Hash data type
Redis JSON Documentation - Using JSON in Redis
Indexing JSON Documents - Creating indexes for JSON data

Feedback

Was this page helpful?