@SUMIT B Thanks for the question and using MS Q&A platform.
You may follow the below steps to read excel data from ADLS gen2 to Synapse Dedicated SQL Pool.
Step1: To read an ecel from the ADLS gen2 - you need to have com.crealytics.spark.excel
package from Maven respository.
Based on the Spark Runtime which you are using you can pick the version - As I using the latest version Apache Spark 3.3 I'm downloading the latest jar file from here: https://mvnrepository.com/artifact/com.crealytics/spark-excel
Step2: Upload the download jar file to workspace packages in Synapse Studio
Manage => Workspace packages => Upload => Upload the downloaded jar file
Step3: Install the package on the Apache Spark pool which you are using to run the notebook.
Select Spark pool from workspace => Packages => Force new setting Immediately => Select from workspace packages => Select excel package => Save
Step4: Wait for the package to install - you can monitor the installion here:
Monitor => Apache Spark Application => SystemReservedJob-LibraryManagement
Step5: Use the sample code to read an excel file from a notebook.
filePath = "abfss://<container
sparkDF = spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(filePath)
Step6: Use the sample code to write using AAD Authentication (Internal Table) using python connector:
Note: If you want to use any other authentication method - you can use the scripts metioned in the article: Azure Synapse Dedicated SQL Pool Connector for Apache Spark
# Write using AAD Auth to internal table
# Add required imports
import com.microsoft.spark.sqlanalytics
from com.microsoft.spark.sqlanalytics.Constants import Constants
# Configure and submit the request to write to Synapse Dedicated SQL Pool
# Sample below is using AAD-based authentication approach; See further examples to leverage SQL Basic auth.
(df.write
# If `Constants.SERVER` is not provided, the `<database_name>` from the three-part table name argument
# to `synapsesql` method is used to infer the Synapse Dedicated SQL End Point.
.option(Constants.SERVER, "<sql-server-name>.sql.azuresynapse.net")
# Like-wise, if `Constants.TEMP_FOLDER` is not provided, the connector will use the runtime staging directory config (see section on Configuration Options for details).
.option(Constants.TEMP_FOLDER, "abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<some_base_path_for_temporary_staging_folders>")
# Choose a save mode that is apt for your use case.
# Options for save modes are "error" or "errorifexists" (default), "overwrite", "append", "ignore".
# refer to https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes
.mode("overwrite")
# Required parameter - Three-part table name to which data will be written
.synapsesql("sqlpool01.dbo.SampleTable"))
As per the repro - I'm able to successfully read and excel file from ADLS gen2 and write to Synapse dedicated SQL pool.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.