Files vanished from storage account

Aleksandra Stan 0 Reputation points
2025-01-13T06:37:02.28+00:00

Morning!

I was wondering if anyone knows what might be the case. We have list of jsons files which we store in a blob storage. We need those files in the Azure synapse therefore I created a notebook to ingest those files. After I ran an ingestion couple of times - files vanished from that source folder. No logs for deletion whatsoever. No clue why this happened. The ingestion code :

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder \
    .appName("Ingest JSON from Synapse") \
    .getOrCreate()

# base path
base_path = "abfss://******@storageaccountX.dfs.core.windows.net/folder1"
folder_name = "affectedfolder"
target_path = f"{base_path}/{folder_name}"

# load all JSON files
df = spark.read.format("json") \
    .option("multiLine", True) \
    .load(f"{target_path}/*.json") \
    .withColumn("file_path", F.input_file_name())

# file_date from the JSON file name
df = df.withColumn("file_date", F.regexp_extract(F.col("file_path"), r"filename(\d{4}-\d{2}-\d{2})\.json", 1))

# silver path
silver_base_path = "abfss://******@storageaccountX.dfs.core.windows.net/"
table_name = folder_name
silver_table_path = f"{silver_base_path}/{table_name}"

# save the DF
df.write.format("delta") \
    .mode("overwrite") \
    .option("delta.columnMapping.mode", "name") \
    .option("overwriteSchema", "true") \
    .save(silver_table_path)

print(f"Data ingested successfully into '{silver_table_path}'.")
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Abiola Akinbade 30,450 Reputation points Volunteer Moderator
    2025-01-13T07:17:36.58+00:00

    Hello Aleksandra Stan,

    Thanks for your question.

    To check delete operations, go to Azure Storage Account > Monitoring > Activity log and check for any delete operations related to the container. If logging isn’t enabled, you may need to enab;e Azure Storage Diagnostics to capture future events.

    That's the best way to capture Azure operations.

    Per your code, maybe temporarily remove the .mode("overwrite") and write to a new path instead of silver_table_path to confirm that this operation doesn’t affect the source folder.

    You can mark it 'Accept Answer' and 'Upvote' if this helped you

    Regards,

    Abiola


  2. Ganesh Gurram 7,235 Reputation points Moderator
    2025-01-15T13:16:39.93+00:00

    Hi @Aleksandra Stan
    Thanks for the question and using MS Q&A platform.
    You're right, it's concerning that the logs don't show any deletion operations. Since soft delete wasn't enabled at the time, recovering the files is unfortunately unlikely. However, you can check the below steps:

    1. Run a Controlled Test - Upload a batch of test files to the source folder, then run your notebook to see if the files disappear or get affected during the process. This helps isolate whether the issue is related to the ingestion process or something external.
    2. Review Automation or External Tools - Check if any automation, such as Azure Data Factory, Azure Logic Apps, or other scheduled tasks, might be interacting with the source folder and moving or deleting files post-ingestion.

    Hope this helps. Let me know if you need further guidance!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.