What is the best way to run a ~ 1 hour python program that is triggered approximately once a week with variable input

Adam Mepham 1 Reputation point
2022-09-20T00:21:51.027+00:00

I have an existing python program that takes a pdf and performs a number of complicated lookups, ultimately producing an excel file. It takes about 1 hour to run. I want to be able to trigger it with the name of a blob file, and have it download this file, run, and upload the resultant excel file to another blob. I have tried to convert it to an Azure Function but I see that they are set to timeout too quickly. I have looked into Web Jobs and Batch Jobs, although it's not clear to me if either is suitable.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,004 questions
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
330 questions
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
7,777 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. Andriy Bilous 11,421 Reputation points MVP
    2022-09-20T05:25:31.997+00:00

    Hello @Adam Mepham

    As your python code is running ~ 1 hour, it is long running operation.
    There are multiple options:

    • Azure Function with Premium or Dedicated plan are good options for you. (service limits) When set to an unbounded duration, your function app is guaranteed to run for at least 60 minutes.
    • Dedicated instances Azure VM is a good option when you expect to run one huge batch with long-running workflows and custom integrations.
    • Azure Batch is also an option to use for long-running operations but it costly

    If you think your question has been answered, click "Mark as Accept Answer" if just helped click "Vote as helpful". This can be beneficial to other community members reading this forum thread.


  2. MughundhanRaveendran-MSFT 12,481 Reputation points
    2022-09-20T09:08:14.98+00:00

    Hi @Adam Mepham ,

    Thanks for reaching out to Q&A.

    A blob triggered function running on a Premium or Dedicated App service plan would suit your requirement as its timeout value is unbounded. It can be specified in the host.json file. Especially, you can explore Durable functions as they are suited for long running purposes.

    Hope this helps! Feel free to reach out to me if you have any queries or concerns.

    0 comments No comments

  3. Subashri Vasudevan 11,206 Reputation points
    2022-09-28T04:10:58.723+00:00

    Hi @Adam Mepham

    As you have pointed out in the comment, you can explore Azure batch,

    Before running python code you can spin up clusters, meaning, you can create pool using REST API by mentioning the config details as given here

    After creating a pool with required nodes, you can free up the pool by either resizing or deleting the pool.

    Here is the thread talking about of pricing with respect to pool state. Might be of use to you.

    Please let us know if it answers your question,

    0 comments No comments

  4. Xin Xin 26 Reputation points Microsoft Employee
    2022-10-23T09:10:13.537+00:00

    The best way is to use Batch job schedule to schedule a task with auto-scale pool, and the pool's initial number of VMs is 0, it will scale-out when submitting task and scale-in when task completed.
    And it has no upfront cost, and only pay as you go. BTW, you can use low-priority or spot VM instances to low you cost.

    Example auto-scale formulas as following:

    maxNumberofVMs = 25;  
    lastSample=val($PendingTasks.GetSample(1),0);  
    $TargetDedicatedNodes=lastSample;  
    

    How to create job schedule:
    https://learn.microsoft.com/zh-cn/azure/batch/batch-job-schedule

    How to create auto-scale pool:
    https://learn.microsoft.com/zh-cn/azure/batch/batch-automatic-scaling#example-autoscale-formulas

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.