An Azure service for ingesting, preparing, and transforming data at scale.
Hi YERNAIDU SIRAPARAPU You're absolutely right Azure Data Factory (ADF) doesn't natively support reading password-protected Excel files. Your approach using an Azure Function to preprocess the file is the correct and scalable solution.
Below is a complete step-by-step guide to help you achieve this:
Step 1: Azure Function – Decrypt and Convert Excel File
1 Create the Azure Function
- Use either a Timer or HTTP Trigger (HTTP is ideal if you want ADF to trigger it).
- Languages like Python or C# work well. Here's a Python-based approach.
2 Connect to Azure Blob Storage:
Install SDKs:
pip install azure-storage-blob pandas msoffcrypto-tool openpyxl
Python code snippet:
from azure.storage.blob import BlobServiceClient
import msoffcrypto
import pandas as pd
# Connect to Blob
blob_service_client = BlobServiceClient.from_connection_string("Your_Connection_String")
blob_client = blob_service_client.get_blob_client(container="your-container", blob="protected.xlsx")
# Download protected file
with open("/tmp/protected.xlsx", "wb") as file:
file.write(blob_client.download_blob().readall())
# Decrypt Excel
with open("/tmp/protected.xlsx", "rb") as f:
office_file = msoffcrypto.OfficeFile(f)
office_file.load_key(password="YourPassword") # Ideally fetch from Azure Key Vault
with open("/tmp/decrypted.xlsx", "wb") as decrypted:
office_file.decrypt(decrypted)
# Convert to CSV
df = pd.read_excel("/tmp/decrypted.xlsx")
df.to_csv("/tmp/output.csv", index=False)
# Upload CSV back to Blob
output_blob = blob_service_client.get_blob_client(container="your-container", blob="output.csv")
with open("/tmp/output.csv", "rb") as data:
output_blob.upload_blob(data, overwrite=True)
Best Practice: Store your Excel password in Azure Key Vault and access it securely using managed identity.
Step 2: ADF Pipeline – Orchestrate the Process
1 Trigger Azure Function
- Use a Web Activity to call your HTTP-triggered Azure Function.
- Pass parameters like blob name or password if needed (you can also use secure system variables or key vault references).
2 Copy Data from Blob to Azure SQL
- Use Copy Data Activity: Source: Blob Storage (CSV file) Sink: Azure SQL Database
- Define or auto-map columns in dataset schema mapping.
Recommended Best Practices
- Use Managed Identity for secure, keyless authentication to Azure resources.
- Clean up temp files (/tmp/ folder in Azure Functions has limited space).
- Enable Application Insights for logging and error monitoring.
- Make the Azure Function idempotent to handle retries gracefully.
Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.