Multithreading on recursive API call function

Thomas Lu 41 Reputation points
2023-12-07T10:48:36.1633333+00:00

I've used Parallel.ForEach and PLINQ to achieve some degree of parallelism/multi-threading in C#.
However, I'm a bit stumped on how to apply parallelism/multi-threading to a recursive function.

In the example function below, I am querying a HR/payroll system's API for a list of employees.

private static async Task<JArray> getHumiEmployeesAsync(int pageNum = 1)
{
    string apiUrl = "https://partners.humi.ca/v1/employees/";
    using (HttpClient client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("Authorization", $"Bearer {HUMI_API_TOKEN}");
        string requestUrl = $"{apiUrl}?page[number]={pageNum}";
        HttpResponseMessage response = await client.GetAsync(requestUrl);
        if (response.IsSuccessStatusCode)
        {
            string responseData = await response.Content.ReadAsStringAsync();
            dynamic jsonResponse = JsonConvert.DeserializeObject(responseData);
            JArray employees = new JArray(((IEnumerable<dynamic>)jsonResponse.data)
                .Select(obj => JObject.FromObject(obj.attributes))
                .Where(employee => employee.end_date == null));
            int total = jsonResponse.meta.pagination.total;
            int page = jsonResponse.meta.pagination.page;
            int perPage = jsonResponse.meta.pagination.per_page;
            if ((total - (page * perPage)) > 0)
            {
                // Attempted to use Task.Run to run the recursive calls in parallel but doesn't work
                var nextPageTask = Task.Run(() => getHumiEmployeesAsync(page + 1));
                JArray nextPageEmployees = await nextPageTask;
                employees.Merge(nextPageEmployees);
            }
            return employees;
        }
        else
        {
            Console.WriteLine($"Error: {response.StatusCode}");
            return null;
        }
    }
}

An example of the response from the HTTP GET request is:

{
  "data": [
    {
      "id": "ecf35d71-4b3b-4cb1-9349-5817f48e4769",
      "type": "employees",
      "attributes": {
        "id": "ecf35d71-4b3b-4cb1-9349-5817f48e4769",
        "first_name": "John",
        "last_name": "Smith",
        "email": "******@company.com",
        "department": "Engineering",
        "position": "Engineering Manager",
        "employment_type": "full-time",
        "start_date": "2018-04-18",
        "end_date": null,
      }
    }
  ],
  "jsonapi": {
    "version": "1.0",
    "meta": "Humi HR API"
  },
  "meta": {
    "copyright": "© 2023 Humi Soft",
    "pagination": {
      "per_page": 25,
      "page": 1,
      "total": 100
    }
  }
}

Using meta.pagination from the response, the function recursively calls getHumiEmployeesAsync while (total - (page * perPage) > 0), and merge the responses together.

While I know it is possible to use Parallel.For to iterate over each page via:

if(total - (page * perPage) > 0)
{
     Parallel.For(2, total / per_page, i =>
          {
               //perform API call per page;
          });
}

that defeats the use of recursion. I'm hoping to continue using a recursive instead of iterative function but apply some parallelism so that the subsequent API calls are made concurrently instead of sequentially.

Is this possible?

Thanks in advance!

Thomas

Developer technologies | C#
{count} votes

1 answer

Sort by: Most helpful
  1. Thomas Lu 41 Reputation points
    2023-12-12T01:08:18.8466667+00:00

    Hi Jiale,

    Thank you for your response and suggestion.

    Upon testing it, however, I've noticed the time complexity between this and my previous function were the same.

    Although the revision you have suggested achieves parallelism, I guess I'm limited by the fact that the program needs to wait for each of the successive responses to evaluate whether or not to make an additional recursive call.

    Using your suggestion of Task.WhenAll iteratively, instead of Parallel.For as I was previously doing, appears to execute almost twice as fast as recursively, so I guess in the interest of speed, it is a much better approach to iterate through each of the pages/tasks, as below:

    public static async Task<JArray> GetAllHumiEmployeesAsync()
    {
    	JObject initialResponse = await getHumiEmployeesRequestAsync();
    	JObject pagination = initialResponse["meta"]["pagination"] as JObject;
    
    	JArray employees = new JArray(((IEnumerable<dynamic>)initialResponse["data"])
    		.Select(obj => JObject.FromObject(obj.attributes))
    		.Where(employee => employee["end_date"] == null));
    
    	int pageCount = (int)Math.Ceiling((double)pagination["total"] / (double)pagination["per_page"]);
    
    	if (pageCount > 1)
    	{
    		var recursiveTasks = Enumerable.Range(2, pageCount)
    			.Select(i => getHumiEmployeesRequestAsync(i))
    			.ToList();
    
    		await Task.WhenAll(recursiveTasks);
    
    		foreach (var task in recursiveTasks)
    		{
    			JObject response = await task;
    			JArray data = new JArray(((IEnumerable<dynamic>)response["data"])
    				.Select(obj => JObject.FromObject(obj.attributes))
    				.Where(employee => employee["end_date"] == null));\
    			lock (employees)
    			{
    				employees.Merge(data);
    			}
    		}
    	}
    	var sortedEmployees = employees.OrderBy(u => u["department"]).ThenBy(u => u["first_name"]).ToList();
    	return new JArray(sortedEmployees); 
    }
    
    
    private static async Task<JObject> getHumiEmployeesAsync(int pageNum = 1)
    {
    	using (var client = new HttpClient())
    	{
    		client.DefaultRequestHeaders.Add("Authorization", $"Bearer {Humi_API_Token}");
    		string requestUrl = $"{apiURL}?page[number]={pageNum}";
    		HttpResponseMessage response = await client.GetAsync(requestUrl);
    		if (response.IsSuccessStatusCode)
    		{
    			string responseData = await response.Content.ReadAsStringAsync();
    			dynamic jsonResponse = JsonConvert.DeserializeObject(responseData);
    			return jsonResponse;
    		}
    		else
    		{
    			throw new HttpRequestException($"Error: {response.StatusCode}");              
    		}
    	}
    }
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.