Share via


Data preparation cheat sheet

High-end BI performance depends on how effectively data is prepared and delivered from the Lakehouse. By adopting architectural patterns, applying a semantic structure, and using targeted optimizations, you can reduce query complexity, improve dashboard responsiveness, and lower compute costs.

The following table summarizes recommended practices, their expected impact, related documentation, and associated action items. This content is intended for data engineers, BI developers, and dashboard authors who design, optimize, and maintain analytics workloads in the Lakehouse.

Data prep

Best practice Impact Docs Action items
Adopt a medallion architecture Speeds up turning raw data into ready-to-use, reliable data products for easy consumption. Review and implement medallion layers
Use liquid clustering Improves query performance with file and data skipping. Apply to large tables with filter patterns
Use managed tables Azure Databricks auto-governs and optimizes the storage layer and query performance. Create managed tables for your data
Use predictive optimization or optimize tables manually Enables better query performance by optimizing file sizes and layout, deleting old files, and updating statistics. Enable for production tables or schedule regular optimization and analyze tables after data changes
Model data in a star schema pattern Makes the data easy to query and consume. Design fact and dimension tables
Avoid wide data types and high-cardinality columns Optimizes data model size and memory consumption, and improves query efficiency. Review data types and cardinality
Declare Primary and Foreign Keys (with RELY) Optimizes queries by eliminating unnecessary joins and aggregations. Define keys on fact and dimension tables
Use auto-generated columns Reduces the need to calculate values at query time. Identify frequently calculated fields
Use materialized views and persisted tables Improves performance by pre-aggregating data for the most common and resource-intensive queries. Create aggregated views for common queries