Porady: używanie parallel_invoke do napisania procedury sortowania równoległego

Artykuł
10/12/2023

W tym dokumencie opisano sposób używania algorytmu parallel_invoke w celu zwiększenia wydajności algorytmu sortowania bitoicznego. Algorytm sortowania bitoicznego rekursywnie dzieli sekwencję wejściową na mniejsze posortowane partycje. Algorytm sortowania bitoicznego może działać równolegle, ponieważ każda operacja partycji jest niezależna od wszystkich innych operacji.

Chociaż sortowanie bitoiczne jest przykładem sieci sortowania, która sortuje wszystkie kombinacje sekwencji wejściowych, w tym przykładzie są sortowane sekwencje, których długość jest mocą dwóch.

Uwaga

W tym przykładzie użyto procedury sortowania równoległego na potrzeby ilustracji. Można również użyć wbudowanych algorytmów sortowania, które zapewnia PPL: concurrency::p arallel_sort, concurrency::p arallel_buffered_sort i concurrency::p arallel_radixsort. Aby uzyskać więcej informacji, zobacz Parallel Algorithms (Algorytmy równoległe).

Sekcje

W tym dokumencie opisano następujące zadania:

Wykonywanie sortowania bitoicznego szeregowo
Równoległe wykonywanie sortowania bitoicznego przy użyciu parallel_invoke

Wykonywanie sortowania bitoicznego szeregowo

W poniższym przykładzie przedstawiono wersję szeregową algorytmu sortowania bitoicznego. Funkcja bitonic_sort dzieli sekwencję na dwie partycje, sortuje te partycje w przeciwnych kierunkach, a następnie scala wyniki. Ta funkcja wywołuje się dwa razy rekursywnie, aby posortować każdą partycję.

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

[Top]

Równoległe wykonywanie sortowania bitoicznego przy użyciu parallel_invoke

W tej sekcji opisano sposób użycia algorytmu parallel_invoke do równoległego wykonywania algorytmu sortowania bitoicznego.

Aby wykonać algorytm bitonicznego sortowania równolegle

Dodaj dyrektywę #include dla pliku nagłówka ppl.h.
```
#include <ppl.h>
```
Dodaj dyrektywę using dla concurrency przestrzeni nazw.
```
using namespace concurrency;
```

Utwórz nową funkcję o nazwie parallel_bitonic_mege, która używa algorytmu parallel_invoke do scalania sekwencji równolegle, jeśli jest wystarczająca ilość pracy do wykonania. W przeciwnym razie wywołaj metodę bitonic_merge , aby scalić sekwencje szeregowo.

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

Wykonaj proces podobny do tego w poprzednim kroku, ale dla bitonic_sort funkcji.

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Utwórz przeciążoną wersję parallel_bitonic_sort funkcji, która sortuje tablicę w kolejności rosnącej.
```
// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}
```
Algorytm parallel_invoke zmniejsza obciążenie, wykonując ostatnią serię zadań w kontekście wywołującym. Na przykład w parallel_bitonic_sort funkcji pierwsze zadanie jest uruchamiane w osobnym kontekście, a drugie zadanie jest uruchamiane w kontekście wywołującym.
```
// Sort the partitions in parallel.
parallel_invoke(
   [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
   [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
);
```

Poniższy kompletny przykład wykonuje zarówno wersje szeregowe, jak i równoległe algorytmu sortowania bitoicznego. Przykład wyświetla również w konsoli czas wymagany do wykonania poszczególnych obliczeń.

// parallel-bitonic-sort.cpp
// compile with: /EHsc
#include <windows.h>
#include <algorithm>
#include <iostream>
#include <random>
#include <ppl.h>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

int wmain()
{  
   // For this example, the size must be a power of two.
   const int size = 0x200000;

   // Create two large arrays and fill them with random values.
   int* a1 = new int[size];
   int* a2 = new int[size];

   mt19937 gen(42);
   for(int i = 0; i < size; ++i)
   {
      a1[i] = a2[i] = gen();
   }

   __int64 elapsed;
   
   // Perform the serial version of the sort.
   elapsed = time_call([&] { bitonic_sort(a1, size); });
   wcout << L"serial time: " << elapsed << endl;

   // Now perform the parallel version of the sort.
   elapsed = time_call([&] { parallel_bitonic_sort(a2, size); });
   wcout << L"parallel time: " << elapsed << endl;
 
   delete[] a1;
   delete[] a2;
}

Następujące przykładowe dane wyjściowe są przeznaczone dla komputera z czterema procesorami.

serial time: 4353
parallel time: 1248

[Top]

Kompilowanie kodu

Aby skompilować kod, skopiuj go, a następnie wklej go w projekcie programu Visual Studio lub wklej go w pliku o nazwie parallel-bitonic-sort.cpp , a następnie uruchom następujące polecenie w oknie wiersza polecenia programu Visual Studio.

cl.exe /EHsc parallel-bitonic-sort.cpp

Niezawodne programowanie

W tym przykładzie użyto algorytmu parallel_invoke zamiast klasy concurrency::task_group , ponieważ okres istnienia każdej grupy zadań nie wykracza poza funkcję. Zalecamy użycie parallel_invoke polecenia , gdy można, ponieważ ma mniejsze obciążenie związane z wykonywaniem niż task group obiekty, a w związku z tym umożliwia pisanie kodu o lepszej wydajności.

Równoległe wersje niektórych algorytmów działają lepiej tylko wtedy, gdy jest wystarczająca ilość pracy do wykonania. Na przykład parallel_bitonic_merge funkcja wywołuje wersję szeregową , bitonic_mergejeśli w sekwencji znajduje się co najmniej 500 elementów. Możesz również zaplanować ogólną strategię sortowania na podstawie ilości pracy. Na przykład bardziej wydajne może być użycie wersji szeregowej algorytmu szybkiego sortowania, jeśli tablica zawiera mniej niż 500 elementów, jak pokazano w poniższym przykładzie:

template <class T>
void quick_sort(T* items, int lo, int n)
{
   // TODO: The function body is omitted for brevity.
}

template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{
   // Use the serial quick sort algorithm if there are relatively few
   // items to sort. The associated overhead for running few tasks in 
   // parallel may not overcome the benefits of parallel processing.
   if (n - lo + 1 <= 500)
   {
      quick_sort(items, lo, n);
   }
   else if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Podobnie jak w przypadku dowolnego algorytmu równoległego, zalecamy profilowanie i dostrajanie kodu zgodnie z potrzebami.

Zobacz też

Równoległość zadań
parallel_invoke, funkcja

Udostępnij za pośrednictwem