Automatically download csv file from a public website using Azure

Sarvesh Pandey 71 Reputation points
2023-01-03T06:11:14.29+00:00

Hi All,

I need to download a CSV file which is publicly available and this file gets updated regularly.
I have designed a flow as per my requirements but i have to download the file manually.

Is there any way to make a process which download the file automatically and store it in ADLS location?

Which tool i can use for this?

275591-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

7 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2023-01-03T18:10:22.953+00:00

    Hello @Sarvesh Pandey ,
    Thanks for the question and using MS Q&A platform.

    As I understand you want to download a csv from a publicly available website. It is possible to do this with many Azure services. Which one depends upon your comfort level and preferences. Data Factory / Synapse, Logic App, Azure Function, are just a few come to mind.

    However I do have a concern. Does that download hyperlink always point to the same absolute URL, or does it change day to day?

    If it is always the same, then the process is very simple.
    If it changes day to day, then we need to first fetch this page, then extract out the hyperlink.

    In Data Factory pipeline copy activity, you would use an HTTP Linked service with Delimited Text or Binary dataset on the source side. On the sink side either Data Lake Gen2 or Blob Storage Linked service, (depending upon whether Heirarchical Namespace is enabled on your storage), and Delimited Text or Binary dataset.

    The source should point at the hyperlink address, not the base webpage.

    Binary moves as-is without any changes. If you want to rename columns or change format, use Delimited Text. Delimited Text can also work without changes.

    For more specific instructions, I need to know more, especially about where you get the data.
    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

  2. Sarvesh Pandey 71 Reputation points
    2023-01-04T07:50:36.997+00:00

    Hi @MartinJaffer-MSFT ,

    Thanks for the response!

    I have checked the URL and its same every day. https://www.nseindia.com/api/market-data-pre-open?key=ALL&csv=true

    I used ADF HTTP linked service to pull the file and load it into ADLS but its throwing error
    Error msg -
    ErrorCode=HttpRequestFailedWithUnauthorizedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Http request failed with status code 401 Unauthorized, usually this is caused by invalid credentials, please check your activity settings.
    Request URL: https://www.nseindia.com/api/market-data-pre-open?key=ALL&csv=true.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (401) Unauthorized.,Source=System,'

    I have used Anonymous as Authenticate method as this website is publicly available.

    Please let me what should I do?


  3. MartinJaffer-MSFT 26,236 Reputation points
    2023-01-17T19:36:48.07+00:00

    I've tried setting up the web table connector for both this page, and an older format of the same website. I've encountered weird connection errors in the linked service. I've also had weird behavior on web activity pointed at anything on this website. So I looked for another way.

    Diving deep into the webpage and its details, I think I found another, cleaner option. The web pages have a bunch of javascript creating the table, and looks like some clicking is needed to set the key from NIFTY to ALL. So I looked into where the javascript gets the table data from.

    Below are the URI the javascript calls to get data to populate the table. It comes in JSON format.

    [https://www1.nseindia.com/live_market/dynaContent/live_analysis/pre_open/all.json

    [https://www.nseindia.com/api/market-data-pre-open?key=ALL

    Follow up: I think something must be wrong with my Factory, as it isn't picking these up right. I can't fathom why.

    0 comments No comments

  4. Sarvesh Pandey 71 Reputation points
    2023-01-18T04:40:09.68+00:00

    HI @MartinJaffer-MSFT ,

    Thanks for all this research. I really appreciate your efforts.

    Can you please let me know how did you do this analysis? I was stuck till the connectivity error and as per my understanding the URL which i shared with you earlier is calling some REST API but i don't know how those stuff works.

    Also, "[https://www.nseindia.com/api/market-data-pre-open?key=ALL" is not working which you had shared.

    How did you get this link '[https://www1.nseindia.com/live_market/dynaContent/live_analysis/pre_open/all.json"?

    0 comments No comments

  5. MartinJaffer-MSFT 26,236 Reputation points
    2023-01-24T19:01:05.4133333+00:00

    @Sarvesh Pandey Well, I looked around the website.

    User's image

    User's image

    User's image

    After changing the selector from "Nifty" to "All", the javascript on page makes call to get data to put in table. When the page originally loaded, the table was empty. The javascript populates the table. This makes it hard to use the "Web table" connector.

    Also, when I tell browser to go directly to that all.json, I get error. But if I go to this webpage first, then go to all.json, I get data. This is just in browser.

    I have also been looking on the website in the downloadable section in hopes the data would be easier to get.

    User's image

    but haven't found anything sounds like "pre-open". Maybe I am using the wrong name?

    Also, looking back, I think Q&A editor ate my links. I had put 4, but now I see 2 badly formated ones.

    At this point maybe I should give you a support ticket? This should have been an easy task, but this website seems to insist on some form of identity (cookies?) before sending data. I'm guessing ADF doesn't run javascript or do the cookies.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.