The problem occurs when alphabet is e instead of d.
Please pay attention to the style of the original website. You get the number of pages based on the tags in the web page, like this:
But when alphabet is e, it has only one page, so this tag is omitted from the web page, so this line:
doc.DocumentNode.SelectNodes("//a[@class ='pagination__link gel-pica-bold']/@href")
will get null, null.Count causes the current problem.
Update:
When there is only one page, we can directly set pagesNum to 1.
There are some minor problems.
The current code only loads once when the letter changes, which allows you to load only the first page no matter how many pages there are.
In addition, no material starts with x, so when the letter is "x", the page will be automatically redirected to the homepage, we can check whether there is a letter list in the currently loaded page to judge this.
var web = new HtmlWeb();
for (char alphabet = 'a'; alphabet <= 'z'; alphabet++)
{
var doc = web.Load($"https://www.bbc.co.uk/food/ingredients/a-z/{alphabet}");
var nodes = doc.DocumentNode.SelectNodes("//a[@class = 'pagination__link gel-pica-bold']/@href");
var pagesNum = nodes == null ? 1 : nodes.Count();
for (int i = 1; i <= pagesNum; i++)
{
System.Console.WriteLine($"alphabet: {alphabet} | page {i}\n");
doc = web.Load($"https://www.bbc.co.uk/food/ingredients/a-z/{alphabet}/{i}");
// No material starts with x, so when alphabet is ‘x’, the page will automatically redirect to the homepage.
// Determine whether the current doc is the home page through this method.
if (doc.DocumentNode.SelectNodes("//*[@class = 'az-keyboard__list']") == null)
break;
var ingridients = doc.DocumentNode.SelectNodes("//*[@class = 'gel-layout__item gel-1/2 gel-1/3@m gel-1/4@xl']")
.ToList();
foreach (var item in ingridients)
{
System.Console.WriteLine(item.InnerText.Replace("ingredient", string.Empty));
}
}
}
If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.