Advise: Need most economically method to do this task...

Robert Lescault 46 Reputation points
2022-12-27T18:07:40.66+00:00

First: This is an individual's project, I'm not a company or organization.

Each week I download a zipped text file. The file is a comma delimited file where the first row are headers. The zipped file size is approx. 460 MB, extracted it is approx. 3.5 GB (about 8 million rows)

1) store the file in my Azure account.
2) run ETL to prep that file to INSERT into a SQL database table (each new week will append to the table).
a) prep will include adding a field which is a concatenation of 5 or 6 of the existing fields, plus adding a field to denote which week's file the row came from. Plus, dropping a lot of
unnecessary columns from the original file.
3) I am running a separate web application to visualize the resulting data (Tableau Public). I'm pretty sure Tableau Public will be able to connect to the SQL db.

Please recommend the most economical way to do this.

I am not a data wizard, but I learn quickly, so whatever you recommend, if it will save me cash to do this process, I'll learn!

Thanks in advance!

Robert

Azure SQL Database
Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,425 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,646 questions
0 comments No comments
{count} vote

Accepted answer
  1. Alberto Morillo 34,676 Reputation points MVP Volunteer Moderator
    2022-12-27T19:27:45.537+00:00

    Data Factory supports read the file/data in the .zip file, we can set the Compression type for the .zip file and there is no need to unzip the contents of a zip file into separate folder.

    274381-image.png

    But you may need to unzip them first to later perform transformations on the files unzipped as explained here.

    You can use Azure Data Factory to copy and transform data before put the result on an Azure SQL (an Azure SQL Database Serverless (with auto-pause enabled when it has no activity). Use the mapping section on the Copy Activity to remove columns (make a click on the trash can next to columns you don't want) you don't want on the Azure SQL (sink or destination). Mapping data flows can help you create and update columns as explained here.

    All elements involved are cheap: Azure Storage Account, Azure Data Factory and Azure SQL Serverless.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.