Assistance Needed for Cleaning Blob Storage

Narek Payaslyan 20 Reputation points
2023-07-03T13:16:21.3866667+00:00

Hello

We require assistance with cleaning our blob storage. Currently, our container holds over 600 million images, and we lack the necessary parameters to efficiently identify and manage them according to our needs.

The only linkage between our database and blob container is the names of the images. We have around 250 million images that we require, and we can only provide their names. Unfortunately, it is not feasible to pass such a large number of strings to delete all the mismatched images from the container.

We attempted to implement a mechanism using the Azure JavaScript (Node) SDK to address this issue. However, our estimations indicated that it would take more than 500 days to complete, which is far too complex for us to handle at this time.

Here's the scenario we followed:

  1. Fetch 2,000 documents from our database per call/page.
  2. Collect all image names from those 2,000 documents, resulting in over 16,000 image names, for example.
  3. Using the Azure JavaScript (Node) SDK, we mark the metadata of these images as "need: true" using async/await.

3.1 const blobClient = containerClient.getBlobClient(blobName);

3.2 await blobClient.setMetadata(metadata);

  1. Once marked, our plan is to delete all the images that do not have the metadata "need: true."

However, the third step alone took more than 500 days based on our estimates. Therefore, we kindly request your recommendations or guidance on an alternative method to streamline this process.

Thank you for your assistance.

Best Regards,

Narek

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,201 questions
Microsoft 365 and Office | Development | Office JavaScript API
{count} votes

Accepted answer
  1. Sumarigo-MSFT 47,471 Reputation points Microsoft Employee Moderator
    2023-07-13T07:31:10.7833333+00:00

    @Narek Payaslyan Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Managing such a large number of images in Blob storage can indeed be a challenging task. To efficiently clean your Blob storage container based on the provided list of image names, you can consider using Azure Batch processing. Azure Batch allows you to parallelize and distribute the workload across multiple compute nodes, significantly reducing the processing time. You may also refer Azure Function to process the images in batches or Azure Data Factory to copy the required images to a New Container, Or Azure Blob Storage lifecycle management to automatically deleted the unwanted images or Azure Cognitive services to identify unwanted images.

    I would recommend you contact support, so If you have a support plan, I request you file a support ticket, else please do let us know, we will try and help you get a one-time free technical support. In this case, could you send an email to AzCommunity[at]Microsoft[dot]com referencing this thread as well as your subscription ID. Please mention "ATTN subm" in the subject field. Thank you for your cooperation on this matter and look forward to your reply.

    Please let us know if you have any further queries. I’m happy to assist you further.


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.