Have you tried the openpyxl library ?
You can install it :
%pip install openpyxl
Then, mount Azure Blob Storage to Azure Databricks :
dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-point-name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>": "<storage-account-key>"}
)
And then read the Excel file using openpyxl :
import openpyxl
# Get the path to the Excel file in Azure Blob Storage
excel_file_path = "/mnt/<mount-point-name>/<excel-file-name>.xlsx"
# Open the Excel file using openpyxl
wb = openpyxl.load_workbook(excel_file_path)
# Get the worksheet you want to read
ws = wb.worksheets[0]
# Iterate over the cells in the worksheet and read the data you need
for row in ws.rows:
for cell in row.cells:
# Do something with the cell value
cell_value = cell.value
And then you will need to create a Pandas df from the data you read and write the transformed df to a new Excel file in Azure Blob Storage :
# Get the path to the new Excel file in Azure Blob Storage
new_excel_file_path = "/mnt/<mount-point-name>/<new-excel-file-name>.xlsx"
df.to_excel(new_excel_file_path, index=False)