(VS2010 Beta1) Native Parallel Programming: ConcRT example -- TwentyFour

WIth Microsoft Visual Studio 2010, Microsoft is releasing a new set of API for native parallel programming named ConRT (concurrent runtime). The purpose of this blog article is to illustrate the basics of ConcRT programming in VS 2010. We will be using Visual Studio Beta1 release for the discussion.

The problem we're trying to parallel here is a simple game we used to play in elementary school named 24. It goes like this, two players evenly divide up a deck of cards, and randomly show two cards at the same time. Then the players are required to find a way to compute the number 24 using the face value of each card once and only once, with four basic operations (addition, subtraction, multiplication, and division). The player who gets the right answer first gives the two cards to the other player. The game continues until they run out of time or one player runs out of cards.

So our basic programming problem here is writing a class to compute 24 using four numbers and four basic operations. The problem itself is not really computation intensive enough to warrant the usage of parallel programming unless we extend it to many more cards. But it's good enough for illustrating parallel programming in a short blog article: simple enough to understand, not too complicated, and yet not too boring.

Here goes the code:

// TwentyFour.cpp : use four numbers once and only once to compute 24 with four basic operations
//

#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <math.h>

#include <concrt.h>

bool g_concrt = true; // true for using Concrt

bool Close(double v1, double v2)
{
    return (abs(v1 - v2) < 0.0001);
}

class CCards : public Concurrency::UnrealizedChore
{
    char   m_expr[256];  // expression
    int    m_count;      // card count
    double m_cards[4];   // card value

    void Oper(Concurrency::StructuredTaskCollection & col, int i, int j, char op, double value)
{
CCards * pNext = new CCards();

        pNext->m_cards[0] = value; // result of operation
        pNext->m_count    = 1;

for (int t = 0; t < m_count; t ++) // copy remaining cards
        {
            if ((t != i) && (t != j))
{
pNext->m_cards[pNext->m_count] = m_cards[t];
                pNext->m_count ++;
            }
}

        // Append expression
        sprintf_s(pNext->m_expr, "%s <%3.1f %c %3.1f = %3.1f> ", m_expr, m_cards[i], op, m_cards[j], value);

if (g_concrt)
{
col.Schedule(pNext); // schedule a task
        }
        else
        {
pNext->Solve();      // direct recursive solve
            delete pNext;
        }
}

CCards()
{
m_pFunction = CCards::Payload; // Conrt call back function
    }

    static void Payload(void * context) // wrapper
    {
CCards * pChore = static_cast<CCards *>((void *) ((BYTE *) context));

        pChore->Solve();
    }

public:

CCards(int a, int b, int c, int d)
{
m_expr[0] = 0;

        m_count = 4;

        m_cards[0] = a;
        m_cards[1] = b;
        m_cards[2] = c;
        m_cards[3] = d;
    }

    void Solve(void)
{
        if (m_count == 1) // leave node of searching
        {
            if (Close(m_cards[0], 24))
{
printf(" thread %4d %s\r\n", GetThreadId(GetCurrentThread()), m_expr);
            }

            return;
        }

Concurrency::StructuredTaskCollection col;

for (int i = 0; i < m_count; i ++)
{
            for (int j = i + 1; j < m_count; j ++)
{
Oper(col, i, j, '+', m_cards[i] + m_cards[j]);

if (m_cards[i] > m_cards[j])
{
Oper(col, i, j, '-', m_cards[i] - m_cards[j]);
                }
                else
                {
Oper(col, j, i, '-', m_cards[j] - m_cards[i]);
                }

Oper(col, i, j, '*', m_cards[i] * m_cards[j]);

if (m_cards[j] != 0)
{
Oper(col, i, j, '/', m_cards[i] / m_cards[j]);
                }

                if (m_cards[i] != 0)
{
Oper(col, j, i, '/', m_cards[j] / m_cards[i]);
                }
}
}

        if (g_concrt)
{
col.Wait(); // wait for all child tasks
        }
}
};

int _tmain(int argc, _TCHAR* argv[])
{
CCards cards(5, 6, 7, 8);

    g_concrt = false;

    printf("Sequential solution:\r\n");
    cards.Solve();

    g_concrt = true;

    printf("\r\n\r\n");
    printf("Parallel solution:\r\n");
    cards.Solve();

return 0;
}

The class CCards contains a list of card face values (m_count, m_cards) and a string showing the steps of computation (m_expr). The main problem solving routine is the Solve method. The loops through all combinations of card selection and operations and then call the Oper help function. The Oper function creates a new CCards object to represent the state of problem solving after an operation is performed (two less cards, one computed card, and updated expression) and then calls the Solve method on it recursively until we hace a single (computed) card left, which is then compared with the target value 24. The accumulated expression will be found is there is a match.

The code can be used in either traditional sequence programming or ConcRT based parallel programming supported by Visual Studio 2010. This is controlled by a global variable g_concrt.

When g_concrt is false, you can ignore the everything from the Concurrency name space. The Oper method will call Solve method directly on the object pointed to be pNext.

When g_concrt is true, we're really using the new native parallel programming API:

  1. The basic ConcRT API is defined in a new header file concrt.h which resides in VC include directory. The runtime support is part of the C++ runtime, so you do not need to link to any new library to use it.
  2. The ConcRT API is in the Concurrency namespace. We're using Concurrency::UnRealizedChore and Concurrency::StructuredTaskCollection here.
  3. A UnrealizedChore represents a unit of work (task/chore) that can be scheduled in a work stealing queue. If you use new operator to allocate a UnrealizedChore and then schedule it, the ConcRT runtime will be responsible to delete it.
  4. A StructuredTaskCollection represents a group of work (task/chore). You can schedule a UnrealizedChore on StructuredTaskCollection. You can wait for all scheduled tasks in a structured task collection.

When g_concrt is true, each invokation of Solve declares a new instance of StructuredTaskCollection, on which roughly N * (N -1) / 2 * 5 UnrealizedChore will be scheduled. The solve methods waits for all chores it scheduled on it before exiting. The Oper method schedules the task on the task collection, instead of calling the Solve method inline.

In the constructor of CCards class, the m_pFunction member of base class UnRealizedChore is set to point to a static function CCards::Payload. When a task/chore is ready to run as determined by the ConcRT runtime, the Payload function will be called. The parameter to the Payload function should point to the scheduled chore, as in this case CCards class object. So it casts the input parameter to CCards class and calls its Solve method for the real execution.

Here is the results you will get when run the program (using 5,6,7,8 as input):

Sequential solution:
    thread 1444  <5.0 + 7.0 = 12.0>  <12.0 - 8.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <5.0 + 7.0 = 12.0>  <8.0 - 6.0 = 2.0>  <2.0 * 12.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <2.0 / 6.0 = 0.3>  <8.0 / 0.3 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <6.0 / 2.0 = 3.0>  <3.0 * 8.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <2.0 / 8.0 = 0.3>  <6.0 / 0.3 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <8.0 / 2.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <6.0 * 8.0 = 48.0>  <48.0 / 2.0 = 24.0>
    thread 1444  <8.0 - 5.0 = 3.0>  <7.0 - 3.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <8.0 - 6.0 = 2.0>  <5.0 + 7.0 = 12.0>  <12.0 * 2.0 = 24.0>
    thread 1444  <6.0 * 8.0 = 48.0>  <7.0 - 5.0 = 2.0>  <48.0 / 2.0 = 24.0>
    thread 1444  <8.0 - 7.0 = 1.0>  <5.0 - 1.0 = 4.0>  <4.0 * 6.0 = 24.0>

Parallel solution:
    thread 5616  <5.0 + 7.0 = 12.0>  <8.0 - 6.0 = 2.0>  <2.0 * 12.0 = 24.0>
    thread 5616  <5.0 + 7.0 = 12.0>  <12.0 - 8.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <8.0 - 7.0 = 1.0>  <5.0 - 1.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4692  <8.0 - 5.0 = 3.0>  <7.0 - 3.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <6.0 * 8.0 = 48.0>  <48.0 / 2.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <8.0 / 2.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <2.0 / 8.0 = 0.3>  <6.0 / 0.3 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <6.0 / 2.0 = 3.0>  <3.0 * 8.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <2.0 / 6.0 = 0.3>  <8.0 / 0.3 = 24.0>
    thread 1444  <6.0 * 8.0 = 48.0>  <7.0 - 5.0 = 2.0>  <48.0 / 2.0 = 24.0>
    thread 5616  <8.0 - 6.0 = 2.0>  <5.0 + 7.0 = 12.0>  <12.0 * 2.0 = 24.0> 

The sequential running of the program solves the problem in a determistic way using a single thread. The Concrt parallelized running of the code solves the problem in a non-determistic way using multiple threads as controlled by the ConcRT runtime.

In the next article, we will discussion how to debug ConcRT programs using new features provided by Visual Studio 2010.