Tuesday, July 29, 2008

[2008.07.29] The Task Parallel Library (TPL) [Part I/IV]

The Task Parallel Library (TPL) is exposed from two namespaces in the System.Threading.dll assembly
  • System.Threading with focus on the Parallel type.
  • System.Threading.Tasks with focus on the Task and Future<T> types.

Using System.Threading.Parallel

  • this class is useful for solving data parallel problems, as it provides support for parallelizing loops and regions in .NET applications
  • this functionality is exposed through a set of static methods on the Parallel type, namely For, ForEach, and Invoke

[1] Parallel.For

  • in many loops, the iterations of the loop are independent, meaning that one iteration does not rely on results from or interfere with any other iteration
  • the System.Threading.Parallel class supports the parallelization of such loops, whereby a developer can take a loop that executes sequentially and convert it to one where every iteration of the loop has the potential to run in parallel, provided enough processing cores are available
  • Note: loops whose iterations are dependent or contain side-effecting operations will be incorrect when run in parallel without the addition of proper custom synchronization
  • here is a comparison of the syntax for sequential and parallel versions of the for loop:
Sequential for loop

for (Int32 i = 0; i < N; i++)

{

    results[i] = Compute(i);

}

Parallel.For using a delegate

Parallel.For(0, N, delegate(int i)

{

    results[i] = Compute(i);

});

Parallel.For using an anonymous function

Parallel.For(0, N, i =>

{   // using a lambda instead of a

    // lengthier anonymous function

    results[i] = Compute(i);

});

  • here is a side-by-side comparison of the differences between the sequential and parallel versions; note that <=> means "is replaced with"
for <=>   Parallel.For
Int32 i = 0 <=>   0
i < N <=>   N
i++ <=>  
results[i] = Compute(i); <=>   delegate(Int32 i) {results[i] = Compute(i);});
OR
i => { results[i] = Compute(i);
  • the type of N in the parallel version is inferred to be of type Int32 based on the value of the first parameter which is 0, the default for an integer
  • notice that in both cases, sequential or parallel, the actual body of the for loop is identical – that has not changed at all
  • if the machine that this code is running on has multiple cores, then the TPL will run the loop in parallel, but if it is just a single core, it will automatically run sequentially
  • the Parallel.For construct is just a shortcut and the implementation of Parallel.For is built on top of other abstractions such as the Task API
    • Parallel.For spawns a bunch of parallel work and it does not return until all that work is done
    • this is possible because it is built on top of these Task abstractions that allow waiting and cancellations
Example:
XAML Code:

<Grid>

        <Grid.RowDefinitions>

            <RowDefinition Height="Auto" />

            <RowDefinition Height="Auto" />

            <RowDefinition Height="Auto" />

            <RowDefinition Height="Auto" />

        </Grid.RowDefinitions>

        <StackPanel Grid.Row="0" Orientation="Horizontal">

            <TextBlock FontSize="12" FontWeight="Bold">

                Time Taken with sequential for:</TextBlock>

            <TextBlock x:Name="textBlock1"></TextBlock>

        </StackPanel>

        <Polyline x:Name="pL"

                Grid.Row="1"

                Stroke="Red"

                StrokeThickness="1">

        </Polyline>

 

        <StackPanel Grid.Row="2" Orientation="Horizontal">

            <TextBlock FontSize="12" FontWeight="Bold">

                Time Taken with Parallel.For:</TextBlock>

            <TextBlock x:Name="textBlock2"></TextBlock>

        </StackPanel>

 

        <Polyline x:Name="pL2"

                Grid.Row="3"

                Stroke="Blue"

                StrokeThickness="1">

        </Polyline>

        <TextBlock x:Name="tb"></TextBlock>

 

    </Grid>

public partial class Polyline : Window

    {

        public Polyline()

        {

            InitializeComponent();

 

            // number of times to loop

            Int32 N = 40000;

 

            // create a Stopwatch to take time measurements

            Stopwatch sw1 = Stopwatch.StartNew();

            // create an array of points to plot

            Point[] temp1 = new Point[N];

            // compute each point in the Sine curve

            //  using sequential for

            for (Int32 i = 0; i < N; i++)

            {

                Double x = i * Math.PI;

                Double y = 40 + 30 * Math.Sin(x / 10);

                temp1[i] = new Point(x, y);

            }

            // get the time taken to run this loop

            Int64 t1 = sw1.ElapsedTicks;

 

 

            Stopwatch sw2 = Stopwatch.StartNew();

            Point[] temp2 = new Point[N];

            // use the Parallel.For construct

            Parallel.For(0, N,

                i =>

                {

                    Double x = i * Math.PI;

                    Double y = 40 + 30 * Math.Sin(x / 10);

                    temp2[i] = new Point(x, y);

 

                });

            Int64 t2 = sw2.ElapsedTicks;

 

            // write the elapsed time taken for

            //  each loop

            textBlock1.Text = t1.ToString();

            textBlock2.Text = t2.ToString();

 

            // draw the sine curve for each

            //  loop

            for (int i = 0; i < N; i++)

            {

                pL.Points.Add(temp1[i]);

                pL2.Points.Add(temp2[i]);

            }

        }

    }// end class

parallel.vs.sequential.for

[2] Parallel.ForEach

  • parallelism is supported for IEnumerable<T> types using the foreach operator
Sequential foreach loop

foreach (MyClass c in data)

{

    Compute(c);

}

Parallel.ForEach using a delegate

Parallel.ForEach(data, delegate(MyClass c)

{

    Compute(c);

});

Parallel.ForEach using an anonymous function

Parallel.ForEach(data, c =>

{

    Compute(c);

});

  • using Parallel.ForEach is generally less efficient than Parallel.For, because many threads must access the same underlying enumerator
  • Parallel.ForEach is, however, intelligent enough to detect and access IList<T> instances in a more efficient manner
Example: consider enumerating all of the image files in a directory and processing the images found using Parallel.ForEach

Parallel.ForEach(Directory.GetFiles(path, "*.jpg"),

    imagePath =>

    {

        ProcessImage(imagePath);

    });

[3] Parallel.Invoke

  • the Invoke static method of the supports type the parallelization of blocks of statements
  • it accepts an array of actions to execute
  • use this to execute a sequence of statements in parallel when each statement block is independent of each other and the order of execution is not important
  • can be used for recursive divide-and-conquer algorithms such as walking a tree