Loading "large" files from Azure blob and computing with Azure functions

Jose Luis de Juan Lopez 0 Reputation points
2024-09-22T20:34:50.1733333+00:00

Hi,

I am facing quite some issues with Azure and getting pretty desperate. I am using Azure blob storage to store de input files from a retail business and Azure functions (in Python) to create some article recomendations. The files have sizes ranging from 30Mb up to 3 GB.

These files get originally saved on a "raw input" container and a series of azure functions should grab them as inputs, do some data cleaning and cooking operations and save the output files to other containers.

From this second containers another set of Functions (mostly the recomender functions) will get the datasets (as well as some other inputs like customer or article ids) and output some article recomendations.

Well the second set of functions basically work but I am struggling in a variety of fronts with the first ones. For example:

The lightest input file (30MB) is basically a dataset of articles from which a cosine similarity has to be calculated and output a top5 of similar articles has to be extracted. Considering the dataset has 40k articles the cosine similarity matrix is around 8-15 GB big and of course I have been unable to process it with an Azure function (I can do the operation locally on a google Colab notebook though ).

The biggest file (3GB) on the other hand has very little computation needed. It is a sales dataset and I only need to do some basic cleaning and ranking, but ofc ourse, cant even load the data into a df with the Azure function.

Since I am running on Consumption plans I have though of moving onto higher tiers but after cheking the Service limits, more specifically the Memory which is what seems to be my main issue, I am unsure if this would actually adress my problems.

It feels like Azure functions might not be the tool I need ( at least for dealing with big files or big computations).

Am I not troubleshooting correctly the issues I have and adapting the code accordingly or really I should be using some other tools?

Any help/tips/recomendations are more than welcome!

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,029 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,876 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.