使用筆記本將資料載入 Lakehouse

文章
10/15/2024

在本教學課程中，瞭解如何使用筆記本將資料讀取/寫入 Fabric Lakehouse。網狀架構支援 Spark API 和 Pandas API，以達成此目標。

使用 Apache Spark API 載入資料

在筆記本的程式碼儲存格中，使用下列程式碼範例從來源讀取資料，並將其載入 Lakehouse 的檔案、資料表或這兩個區段。

若要指定要讀取的位置，如果資料來自目前筆記本的預設 Lakehouse，您可以使用相對路徑。或者，如果資料來自不同的 Lakehouse，您可以使用絕對的 Azure Blob 檔系統 (ABFS) 路徑。從資料的操作功能表複製此路徑。

複製ABFS路徑：此選項會傳回檔案的絕對路徑。

複製 Spark 的相對路徑：此選項會傳回預設 Lakehouse 中檔案的相對路徑。

df = spark.read.parquet("location to read from") 

# Keep it if you want to save dataframe as CSV files to Files section of the default lakehouse

df.write.mode("overwrite").format("csv").save("Files/ " + csv_table_name)

# Keep it if you want to save dataframe as Parquet files to Files section of the default lakehouse

df.write.mode("overwrite").format("parquet").save("Files/" + parquet_table_name)

# Keep it if you want to save dataframe as a delta lake, parquet table to Tables section of the default lakehouse

df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)

# Keep it if you want to save the dataframe as a delta lake, appending the data to an existing table

df.write.mode("append").format("delta").saveAsTable(delta_table_name)

使用 Pandas API 載入資料

為了支援 Pandas API，預設 Lakehouse 會自動掛接至筆記本。裝入點是『/lakehouse/default/』。您可以使用這個載入點從預設 Lakehouse 讀取/寫入資料。操作選單中的複製檔案 API 路徑選項會從該載入點傳回檔案 API 路徑。從複製 ABFS 路徑選項傳回的路徑也適用於 Pandas API。

複製檔案 API 路徑：此選項會傳回預設 Lakehouse 裝入點下的路徑。

# Keep it if you want to read parquet file with Pandas from the default lakehouse mount point 

import pandas as pd
df = pd.read_parquet("/lakehouse/default/Files/sample.parquet")

# Keep it if you want to read parquet file with Pandas from the absolute abfss path 

import pandas as pd
df = pd.read_parquet("abfss://DevExpBuildDemo@msit-onelake.dfs.fabric.microsoft.com/Marketing_LH.Lakehouse/Files/sample.parquet")

提示

針對 Spark API，請使用複製 ABFS 路徑或複製 Spark 的相對路徑選項，以取得檔案的路徑。針對 Pandas API，請使用複製 ABFS 路徑或複製檔案 API 路徑的選項來取得檔案的路徑。

讓程式碼使用 Spark API 或 Pandas API 的最快方式是使用載入資料選項，然後選取您想要使用的 API。程式碼會在筆記本的新程式碼儲存格中自動產生。

使用筆記本探索 Lakehouse 中的資料

分享方式：

使用筆記本將資料載入 Lakehouse

使用 Apache Spark API 載入資料

使用 Pandas API 載入資料

意見反映

更多資源

分享方式：

使用筆記本將資料載入 Lakehouse

使用 Apache Spark API 載入資料

使用 Pandas API 載入資料

相關內容

意見反映

更多資源