Hi AC (BI),
Thank you for posting in the Microsoft Community Forums.
Read Data:
Your read data section looks correct, just make sure your storage account, path and format (parquet) are all correct.
Creating temporary views:
Creating the temporary view entities is also correct, which allows you to use it in subsequent SQL queries.
Create the Delta table using CETAS:
Creating a Delta table in a SQL cell using the CETAS (Create Table As Select) syntax is correct. However, there are a few things to keep in mind:
Make sure that C365 is the database that you have configured in Delta Lake, or make sure that you are authorized to create tables in the default database.
Ensure that the specified path is valid and that the Spark cluster has permission to access it.
Delta table merge operations:
There are several potential issues here:
Path to the DeltaTable object: Make sure that the path to the FactEntities object matches the path you defined in CETAS.
Alias usage: You are using the aliases entities and New correctly in the merge method. however, if uprn is the unique identifier you are using for the merge, make sure that this field exists in both datasets and is of the same data type.
Performing the merge: Your merge logic (whenMatchedUpdateAll().whenNotMatchedInsertAll()) looks correct, which means that if the records match (based on the uprn), the existing records are updated; if they don't match, the new records are inserted.
Script structure:
Your script is a mix of PySpark code and SQL cells. Make sure to execute these cells correctly in a Jupyter Notebook or Databricks environment, as SQL cells require specific environment support.
Error Handling:
In real-world deployments, it's a good practice to add error handling logic (e.g., try-except blocks) to catch and handle exceptions that may occur (e.g., file access issues, network problems, etc.).
If you still can't solve the problems encountered in executing your scripts, you may want to seek the advice of a coding professional.
Best regards
Neuvi