Explore data in your mirrored database with notebooks

You can explore the data replicated from your mirrored database with Spark queries in notebooks.

Notebooks are a powerful code item for you to develop Apache Spark jobs and machine learning experiments on your data. You can use notebooks in the Fabric Lakehouse to explore your mirrored tables.

Prerequisites

Create a shortcut

You first need to create a shortcut from your mirrored tables into the Lakehouse, and then build notebooks with Spark queries in your Lakehouse.

  1. In the Fabric portal, open Data Engineering.

  2. If you don't have a Lakehouse created already, select Lakehouse and create a new Lakehouse by giving it a name.

  3. Select Get Data -> New shortcut.

  4. Select Microsoft OneLake.

  5. You can see all your mirrored databases in the Fabric workspace.

  6. Select the mirrored database you want to add to your Lakehouse, as a shortcut.

  7. Select desired tables from the mirrored database.

  8. Select Next, then Create.

  9. In the Explorer, you can now see selected table data in your Lakehouse. Screenshot from the Fabric portal, showing the Lakehouse Explorer displaying the mirrored database tables and data.

    Tip

    You can add other data in Lakehouse directly or bring shortcuts like S3, ADLS Gen2. You can navigate to the SQL analytics endpoint of the Lakehouse and join the data across all these sources with mirrored data seamlessly.

  10. To explore this data in Spark, select the ... dots next to any table. Select New notebook or Existing notebook to begin analysis. Screenshot from the Fabric portal showing the context menu to open a mirrored database table in a notebook.

  11. The notebook will automatically open and the load the dataframe with a SELECT ... LIMIT 1000 Spark SQL query.

    • New notebooks can take up to two minutes to load completely. You can avoid this delay by using an existing notebook with an active session. Screenshot from the Fabric portal showing data from a mirrored database table in a new notebook with a Spark SQL query.