Identify and analyze data relationships

6 minutes

Before you model data in Azure DocumentDB, you need to understand the relationships between entities in your application. This analysis drives every modeling decision you make, from whether to embed data inside a document to whether to store it in a separate collection with a reference.

Identify entities and map relationships

Start by listing the core entities in your application. For example, for an e-commerce platform, the main entities are customers, products, orders, reviews, and categories. Each entity has attributes (fields) and connects to other entities through relationships.

Three types of relationships appear in document databases:

One-to-one (1:1): One entity relates to exactly one other entity. A customer has one account (sign in credentials, authorization settings). An order has one shipping address.
One-to-many (1:N): One entity relates to multiple instances of another. A customer has many orders. A product has many reviews.
Many-to-many (M:N): Multiple instances of one entity relate to multiple instances of another. Products belong to many categories, and categories contain many products.

After you identify entities, map the relationships between them:

Example relationship map

Customer → Orders: one-to-many
Customer → Addresses: one-to-many (bounded: typically 1-2)
Customer → Reviews: one-to-many
Order → Items: one-to-many (bounded: typically 1-5)
Order → ShippingAddress: one-to-one (snapshot at order time)
Product → Reviews: one-to-many (potentially unbounded)
Product ↔ Categories: many-to-many

Relationships determine how you structure your documents and collections.

Analyze cardinality

The relationship type alone doesn't tell you enough. You also need to determine cardinality, which is the expected number of items on each side of the relationship. A one-to-many relationship where "many" means three addresses is different from one where "many" means 100,000 reviews.

Estimate the minimum, maximum, and typical counts for each relationship:

Relationship	Min	Typical	Max	Growth
Customer → Addresses	0	2	5	Bounded
Customer → Orders	0	16	50+	Unbounded
Order → Items	1	3	5	Bounded
Product → Reviews	0	25	100,000+	Unbounded

Cardinality breaks down into three practical categories:

One-to-few (less than 100): Addresses per customer, items per order
One-to-many (100 to 10,000): Reviews per product, orders per customer
One-to-millions (10,000+): Event logs, sensor readings

Cardinality directly influences your document structure: one-to-few relationships often embed well, while one-to-many or one-to-millions relationships usually need referencing or a hybrid approach.

Analyze access patterns

Cardinality tells you how much data exists. Access patterns tell you how your application uses that data. For each relationship, answer these questions:

How often is the data accessed together? When your application loads a product page, does it always need the reviews? When it loads an order, does it always need the customer's full profile?

Access pattern analysis

Display Product Page (high frequency):
- Needs: product details, category info, top three reviews, review stats
- Strategy: embed category + top reviews with product
Display Customer Dashboard (high frequency):
- Needs: customer info, recent five orders
- Strategy: embed recent orders subset with customer, or query separately
Process Order (high frequency):
- Needs: order details, all items, shipping address
- Strategy: embed items and address with order

What's the read-to-write ratio? Data that's read frequently but written rarely is a good candidate for embedding, because duplicated data stays consistent. Data that changes often is better stored separately.

Data	Reads/day	Writes/day	Ratio	Implication
Product reviews	10,000	10	1,000:1	Read-heavy; embed a subset
Product inventory	5,000	500	10:1	Frequent writes; reference separately
Customer address	1,000	5	200:1	Rarely changes; embed with customer

A high read-to-write ratio (100:1 or more) favors embedding because stale copies are rare. A low ratio (under 10:1) favors referencing to avoid frequent multi-document updates.

Evaluate ownership and lifecycle

The relationship between parent and child entities affects your modeling strategy:

Strong ownership: The child doesn't exist without the parent. Order items belong to a specific order. If you delete the order, the items have no meaning. Embed. For example, embed:
- Order → Items: items are part of the order
- Customer → Addresses: addresses belong to the customer
Weak ownership: Entities exist independently. A customer exists whether or not they have orders. A product exists whether or not it has reviews. Reference. For example, reference:
- Order → Customer: customer exists independently
- Review → Product: product exists without reviews

If the child can't exist without the parent, embedding keeps the data together and simplifies deletes. If both entities have independent lifecycles, referencing avoids tightly coupling them.

Put it all together

Combine your analysis into a relationship summary that guides your modeling decisions. For each relationship, record the type, cardinality, access frequency, read/write ratio, and ownership:

E-commerce relationship analysis summary

Customer
- → Profile: 1:1, always together, rarely changes → embed
- → Addresses: 1:few (1-2), usually together → embed
- → Orders: 1:many (unbounded), recent needed → subset or reference
Order
- → Items: 1:few (1-5), always together, strong ownership → embed
- → Address: 1:1, snapshot at order time → embed
Product
- → Reviews: 1:many (unbounded), top N needed → subset + reference
- ↔ Categories: M:N, display needed → denormalize + reference

This analysis forms the foundation for the decision framework you apply in the next unit. It's not about one single factor, it's about considering cardinality, access patterns, read/write ratios, and ownership together to make informed modeling decisions.

Feedback

Was this page helpful?