אירוע
31 במרץ, 23 - 2 באפר׳, 23
האירוע הגדול ביותר של Fabric, Power BI ו- SQL learning. 31 במרץ - 2 באפריל. השתמש בקוד FABINSIDER כדי לחסוך $400.
הירשם עוד היוםהדפדפן הזה אינו נתמך עוד.
שדרג ל- Microsoft Edge כדי לנצל את התכונות, עדכוני האבטחה והתמיכה הטכנית העדכניים ביותר.
Applies to: ✅ SQL analytics endpoint in Microsoft Fabric
The SQL analytics endpoint enables you to query data in the lakehouse using T-SQL language and TDS protocol. Every lakehouse has one SQL analytics endpoint. The number of SQL analytics endpoints in a workspace matches the number of lakehouses and mirrored databases provisioned in that one workspace.
A background process is responsible for scanning lakehouse for changes, and keeping SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a workspace. The sync process is transparently managed by Microsoft Fabric platform. When a change is detected in a lakehouse, a background process updates metadata and the SQL analytics endpoint reflects the changes committed to lakehouse tables. Under normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is less than one minute. The actual length of time can vary from a few seconds to minutes depending on a number of factors that are dicussed in this article.
The SQL analytics endpoint manages the automatically generated tables so the workspace users can't modify them. Users can enrich the database model by adding their own SQL schemas, views, procedures, and other database objects.
For every Delta table in your Lakehouse, the SQL analytics endpoint automatically generates a table in the appropriate schema. For autogenerated schema data types for the SQL analytics endpoint, see Data types in Microsoft Fabric.
Tables in the SQL analytics endpoint are created with a minor delay. Once you create or update Delta Lake table in the lake, the SQL analytics endpoint table that references the Delta lake table will be created/refreshed automatically.
The amount of time it takes to refresh the table is related to how optimized the Delta tables are. For more information, review Delta Lake table optimization and V-Order to learn more about key scenarios, and an in-depth guide on how to efficiently maintain Delta tables for maximum performance.
You can manually force a refresh of the automatic metadata scanning in the Fabric portal. On the page for the SQL analytics endpoint, select the Refresh button in the Explorer toolbar to refresh the schema. Go to Query your SQL analytics endpoint, and look for the refresh button, as shown in the following image.
The choice of partition column for a delta table in a lakehouse also affects the time it takes to sync changes to SQL analytics endpoint. The number and size of partitions of the partition column are important for performance:
A large volume of small-sized parquet files increases the time it takes to sync changes between a lakehouse and its associated SQL analytics endpoint. You might end up with large number of parquet files in a delta table for one or more reasons:
Use the following notebook to print a report detailing size and details of partitions underpinning a delta table.
delta_table_path
.
COPY PATH
from the list of options.The complete script can be copied from the following code block:
# Purpose: Print out details of partitions, files per partitions, and size per partition in GB.
from notebookutils import mssparkutils
# Define ABFSS path for your delta table. You can get ABFSS path of a delta table by simply right-clicking on table name and selecting COPY PATH from the list of options.
delta_table_path = "abfss://<workspace id>@<onelake>.dfs.fabric.microsoft.com/<lakehouse id>/Tables/<tablename>"
# List all partitions for given delta table
partitions = mssparkutils.fs.ls(delta_table_path)
# Initialize a dictionary to store partition details
partition_details = {}
# Iterate through each partition
for partition in partitions:
if partition.isDir:
partition_name = partition.name
partition_path = partition.path
files = mssparkutils.fs.ls(partition_path)
# Calculate the total size of the partition
total_size = sum(file.size for file in files if not file.isDir)
# Count the number of files
file_count = sum(1 for file in files if not file.isDir)
# Write partition details
partition_details[partition_name] = {
"size_bytes": total_size,
"file_count": file_count
}
# Print the partition details
for partition_name, details in partition_details.items():
print(f"{partition_name}, Size: {details['size_bytes']:.2f} bytes, Number of files: {details['file_count']}")
אירוע
31 במרץ, 23 - 2 באפר׳, 23
האירוע הגדול ביותר של Fabric, Power BI ו- SQL learning. 31 במרץ - 2 באפריל. השתמש בקוד FABINSIDER כדי לחסוך $400.
הירשם עוד היוםהדרכה
מודול
Work with Delta Lake tables in Microsoft Fabric - Training
Tables in a Microsoft Fabric lakehouse are based on the Delta Lake technology commonly used in Apache Spark. By using the enhanced capabilities of delta tables, you can create advanced analytics solutions.
אישור
Microsoft Certified: Fabric Data Engineer Associate - Certifications
כמהנדסת נתונים של Fabric, עליך להיות בעל מומחיות בנושאים שונים עם דפוסי טעינת נתונים, ארכיטקטורות נתונים ותהליכי תזמור.
תיעוד
Better together - the lakehouse and warehouse - Microsoft Fabric
Learn more about scenarios for the lakehouse and data warehousing workloads in Microsoft Fabric.
Source control with Warehouse (preview) - Microsoft Fabric
Learn how to use source control with Microsoft Fabric Warehouse.
Warehouse performance guidelines - Microsoft Fabric
This article contains a list of performance guidelines for warehouse.