Редагувати

Поділитися через


Microsoft Fabric terminology

Learn the definitions of terms used in Microsoft Fabric, including terms specific to Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Real-Time Intelligence, Data Factory, and Power BI.

General terms

  • Capacity: Capacity is a dedicated set of resources that is available at a given time to be used. Capacity defines the ability of a resource to perform an activity or to produce output. Different items consume different capacity at a certain time. Fabric offers capacity through the Fabric SKU and Trials. For more information, see What is capacity?

  • Experience: A collection of capabilities targeted to a specific functionality. The Fabric experiences include Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Real-Time Intelligence, Data Factory, and Power BI.

  • Item: An item a set of capabilities within an experience. Users can create, edit, and delete them. Each item type provides different capabilities. For example, the Data Engineering experience includes the lakehouse, notebook, and Spark job definition items.

  • Tenant: A tenant is a single instance of Fabric for an organization and is aligned with a Microsoft Entra ID.

  • Workspace: A workspace is a collection of items that brings together different functionality in a single environment designed for collaboration. It acts as a container that uses capacity for the work that is executed, and provides controls for who can access the items in it. For example, in a workspace, users create reports, notebooks, semantic models, etc. For more information, see Workspaces article.

Synapse Data Engineering

  • Lakehouse: A lakehouse is a collection of files, folders, and tables that represent a database over a data lake used by the Apache Spark engine and SQL engine for big data processing. A lakehouse includes enhanced capabilities for ACID transactions when using the open-source Delta formatted tables. The lakehouse item is hosted within a unique workspace folder in Microsoft OneLake. It contains files in various formats (structured and unstructured) organized in folders and subfolders. For more information, see What is a lakehouse?

  • Notebook: A Fabric notebook is a multi-language interactive programming tool with rich functions. Which include authoring code and markdown, running and monitoring a Spark job, viewing and visualizing result, and collaborating with the team. It helps data engineers and data scientist to explore and process data, and build machine learning experiments with both code and low-code experience. It can be easily transformed to a pipeline activity for orchestration.

  • Spark application: An Apache Spark application is a program written by a user using one of Spark's API languages (Scala, Python, Spark SQL, or Java) or Microsoft-added languages (.NET with C# or F#). When an application runs, it's divided into one or more Spark jobs that run in parallel to process the data faster. For more information, see Spark application monitoring.

  • Apache Spark job: A Spark job is part of a Spark application that is run in parallel with other jobs in the application. A job consists of multiple tasks. For more information, see Spark job monitoring.

  • Apache Spark job definition: A Spark job definition is a set of parameters, set by the user, indicating how a Spark application should be run. It allows you to submit batch or streaming jobs to the Spark cluster. For more information, see What is an Apache Spark job definition?

  • V-order: A write optimization to the parquet file format that enables fast reads and provides cost efficiency and better performance. All the Fabric engines write v-ordered parquet files by default.

Data Factory

  • Connector: Data Factory offers a rich set of connectors that allow you to connect to different types of data stores. Once connected, you can transform the data. For more information, see connectors.

  • Data pipeline: In Data Factory, a data pipeline is used for orchestrating data movement and transformation. These pipelines are different from the deployment pipelines in Fabric. For more information, see Pipelines in the Data Factory overview.

  • Dataflow Gen2: Dataflows provide a low-code interface for ingesting data from hundreds of data sources and transforming your data. Dataflows in Fabric are referred to as Dataflow Gen2. Dataflow Gen1 exists in Power BI. Dataflow Gen2 offers extra capabilities compared to Dataflows in Azure Data Factory or Power BI. You can't upgrade from Gen1 to Gen2. For more information, see Dataflows in the Data Factory overview.

  • Trigger: An automation capability in Data Factory that initiates pipelines based on specific conditions, such as schedules or data availability.

Synapse Data Science

  • Data Wrangler: Data Wrangler is a notebook-based tool that provides users with an immersive experience to conduct exploratory data analysis. The feature combines a grid-like data display with dynamic summary statistics and a set of common data-cleansing operations, all available with a few selected icons. Each operation generates code that can be saved back to the notebook as a reusable script.

  • Experiment: A machine learning experiment is the primary unit of organization and control for all related machine learning runs. For more information, see Machine learning experiments in Microsoft Fabric.

  • Model: A machine learning model is a file trained to recognize certain types of patterns. You train a model over a set of data, and you provide it with an algorithm that it uses to reason over and learn from that data set. For more information, see Machine learning model.

  • Run: A run corresponds to a single execution of model code. In MLflow, tracking is based on experiments and runs.

Synapse data warehouse

  • SQL analytics endpoint: Each Lakehouse has a SQL analytics endpoint that allows a user to query delta table data with TSQL over TDS. For more information, see SQL analytics endpoint.

  • Synapse Data Warehouse: The Synapse Data Warehouse functions as a traditional data warehouse and supports the full transactional T-SQL capabilities you would expect from an enterprise data warehouse. For more information, see Synapse Data Warehouse.

Real-Time Intelligence

  • KQL database: The KQL database holds data in a format that you can execute KQL queries against. For more information, see Query a KQL database.

  • KQL Queryset: The KQL Queryset is the item used to run queries, view results, and manipulate query results on data from your Data Explorer database. The queryset includes the databases and tables, the queries, and the results. The KQL Queryset allows you to save queries for future use, or export and share queries with others. For more information, see Query data in the KQL Queryset

  • Event stream: The Microsoft Fabric event streams feature provides a centralized place in the Fabric platform to capture, transform, and route real-time events to destinations with a no-code experience. An event stream consists of various streaming data sources, ingestion destinations, and an event processor when the transformation is needed. For more information, see Microsoft Fabric event streams.

OneLake

  • Shortcut: Shortcuts are embedded references within OneLake that point to other file store locations. They provide a way to connect to existing data without having to directly copy it. For more information, see OneLake shortcuts.