What is the best way to execute long-running and high memory usage tasks on Azure?

arainbadami201franc 11 Reputation points
2021-06-23T18:49:05.1+00:00

I need to find the best way to preform long running tasks on Azure. Scenario:

User picks the dataset and filters on the web app (Azure App Service)
Based on the requirements we create SQL query
Query is executed against one or more Azure SQL databases
Data is being exported in .csv format and uploaded to Azure Blob Storage
The main issues are that execution of some SQL queries can last for 2+ hours and resultset can have 100M+ rows.

I believe that Azure Functions (and subsequently Durable Functions) are not a option because of the timeout and memory usage.

Azure SQL Database
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. KalyanChanumolu-MSFT 8,351 Reputation points
    2021-06-24T04:23:31.497+00:00

    @arainbadami201franc @Ronen Ariely already covered it from a database perspective.

    For background processing, you can use a Worker Service. These were introduced with .Net Core 3 and let you run your background processing tasks with ease.

    A tutorial with code snippets is here
    This video tutorial covers them in detail

    ----------

    If an answer is helpful, please "Accept answer" or "Up-Vote" which might help other community members reading this thread.
    And if you have further questions, please let us know.

    1 person found this answer helpful.
    0 comments No comments

  2. Ronen Ariely 15,206 Reputation points
    2021-06-23T22:11:44.147+00:00

    Good day @arainbadami201franc

    I need to find the best way to preform long running tasks on Azure.

    From the database side, best way depend on your specific query and your specific database structure and even on the existing data in the table.

    SQL Server does not execute the query as you send to it. Instead it parse the query and build multiple execution plan which in next step it select one of these options as the execution plan which it will use. The same query executed on two different databases might result with totally different execution plan. Even on the same database the server might build different execution plan next time that he will need to run the query if the previous one was removed for example.

    Therefore, with the information that you provided we cannot discuss on what is best but we can point you to the right direction with some relevant points and options.

    Query is executed against one or more Azure SQL databases

    Azure SQL Database is a database level service which mean that each database can be managed by totally different server on different machine (what we use and name Azure SQL Server is a virtual server or an application which manage a database to give us the looks and feel like the master database in on-premises server).

    As such, we cannot execute one simple query on multiple databases like in on-premises databases. There are options to use tools to have a cross databases queries like using elastic query or using External Data for example.

    Data is being exported in .csv format and uploaded to Azure Blob Storage

    This might be the source of your main issue which we can focus on.

    In first glance your best option for this step is to use BUILK EXPORT using bcp and export the data, but bcp does not support directly export to the BLOB.

    You can check the following documentation about Methods for bulk importing and exporting data (you should probably use bcp):

    https://learn.microsoft.com/en-us/sql/relational-databases/import-export/bulk-import-and-export-of-data-sql-server?view=sql-server-ver15#MethodsForBuliIE

    NOTE! Passing data of 100M rows from the Azure to on-premises can be slow depending on your network speed. It might be better option to use virtual machine which is on the same region as the database.

    we create SQL query

    If you need to export the data of entire table then there are fast an simple option to export the data directly to the Azure BLOB storage by creating BACPAK file that include the data. In your case you need to export the result set of a query which does not have this option (bcp does not support export to the azure BLOB)

    ****** My tricks which might fit for some cases best :-) *******

    Option one: You can create azure file and map it as share network drive -> next you can simply use bcp to export the result set directly to the azure blob using the network path.

    Option two: if the data do not changed a lot then maybe You can use query to create new table which include the result set of the data using simple SQL SELECT INTO query -> next since this is now a simple table then you can create bacpac that include the data from this table using PowerShell command and directly export the data to the BLOB.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.