Set cloudFiles.maxFileAge and cloudFiles.backfillInterval values in Autoloader

Hiran Amarathunga 95 Reputation points
2024-07-02T02:12:40.21+00:00

I'm using following in the autoloader options.

.option("cloudFiles.maxFileAge", "90 days")\
.option("cloudFiles.backfillInterval", "1 day")\

Our data retention policy is 7 years. Shall I use maxFileAge as 7 years which is not good value?

And, I want to use backfillInterval to load missing data for all time (Not limited to 90 days).

Highly appreciate expert comments to select best values for these parameters.

Thank you

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,214 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 90,221 Reputation points Microsoft Employee
    2024-07-02T03:00:47.9366667+00:00

    @Hiran Amarathunga - Thanks for the question and using MS Q&A platform.

    It's great to see that you are using Azure Databricks Autoloader for your data ingestion needs. Regarding your question, it is not recommended to set the cloudFiles.maxFileAge to 7 years as it is a very large value and can cause performance issues. The recommended value for cloudFiles.maxFileAge is usually between 30 to 90 days, depending on your specific use case.

    As for the cloudFiles.backfillInterval, it is used to load missing data for a specific time period. If you want to load missing data for all time, you can set the value to a very large number, such as 100 years. However, keep in mind that setting a very large value can cause performance issues and increase the time it takes to load data.

    It is recommended to test different values for these parameters and monitor the performance of your Autoloader job to determine the best values for your specific use case.

    For more details, refer to Azure Databricks - Auto Loader options

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.