Hi @nellie ,
According to your description, I think it can be implemented in C#.
First, you can use WebClient to download html resources.
using System.Net;
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");
// Or you can get the file content without saving it
string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}
And then use Html Agility Pack to traverse all <a> tags in the resource, and then filter to obtain downloadable hyperlink addresses. But there may be other problems, so you need to do some exception handling.
public static int i = 1;
public static void downloadRes(string url)
{
using (WebClient client = new WebClient())
{
client.DownloadFile(url, "D:\\localfile" + i++ + ".html");
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(url);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
string href = link.Attributes["href"].Value.ToString();
if (href.StartsWith("https"))
{
downloadRes(href);
}
}
}
}
Hope this can help you.
Best regards,
Xudong Peng
If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.