Gal Ratner
Gal Ratner is a Techie who lives and works in Los Angeles CA and Austin TX. Follow galratner on Twitter Google
A quick lap around .NET 4.0’s parallel features.

What is parallelism and why should I care?

As computers today are reaching the practical limits of clock speeds, you are likely to find your program running as fast, or as slow, on newer machines, as they are on your older box. CPU manufacturers have bypassed the clock speed block issue by assembling multi core CPUs. A CPU with two or four cores is pretty standard in the market. In fact, my own development box is a quad core. So how can you capitalize on a multi core machine and speed up your program? One thread can only run on one core at one time. If cores are not clocking faster, threads are not running faster. The solution is to try and redistribute your threads evenly across all cores. The idea behind parallelism is to have the same operation performed concurrently on as many cores as you have available, or as many as you specify. Each core is in charge of a small amount of work and is kept constantly busy. At the end of the task, data is joined from all cores and signals to your program the operation has finished.

Wow sounds good. How can I use it in my code?

Since this is a practical guide, designed to get you started on your way to parallel perfection  I am going to break down the classes and show you code you can use, rather than go on and on about the theory of forking work between cores.

First let’s start with what’s new in the framework. There are two new namespaces:

System.Threading.Tasks – This namespace contains the Task Parallel Library (TPL). You can think about Tasks as glorified Threads for now. We will get to Tasks later in this article.
System.Collections.Concurrent – Contains thread-safe collection classes that are used when multiple threads are accessing the collections concurrently
There are also enhancements to System.Linq in order to accommodate PLINQ

New data types

System.Collections.Concurrent  Contains new thread safe data types you can leverage when you use tasks. The types I find most useful are:
BlockingCollection<T> 
ConcurrentBag<T>
ConcurrentDictionary<TKey, TValue>
ConcurrentQueue<T>
ConcurrentStack<T>

Let’s look how we can leverage BlockingCollection in a snippet of concurrent code. BlockingCollection implements IProducerConsumerCollection<T> and is designed to allow multiple threads to add and remove elements. This collection will block until there are elements to be removed.

BlockingCollection<int> bc = new BlockingCollection<int>();

 

            Task t1 = Task.Factory.StartNew(() =>

            {

                bc.Add(1);

                bc.CompleteAdding();

            });

 

            Task t2 = Task.Factory.StartNew(() =>

            {

                try

                {

                    while (true) bc.Take();

                }

                catch (InvalidOperationException)

                {

                    Console.WriteLine("This collection is completed");

                }

            });

 

            Task.WaitAll(t1, t2);

Task Parallel Library (TPL)

Tasks represent asynchronous operations, much like Threads. Under the hood, tasks are composed of multiple threads, locking, scheduling and internal pools, make sure Tasks scale across all cores.
When dealing with Tasks you can use Task.Factory.StartNew and pass in delegates as your methods to execute along with execution options, or you can simply start a new Task. This is equivalent to using the default thread pool vs. instantiating new threads.

Task<int> t1 = new Task<int>(() => 1);

            Task<int> t2 = t1.ContinueWith((antecedent) =>

                    {

                        return 1 + antecedent.Result;

                    });

            t1.Start();

            Console.WriteLine(t2.Result);

 

Parallel loops

System.Threading.Tasks.Parallel contains the implementation for simple parallel loops. For and ForEach are supported as well as invoke, which executes an array of Actions. Here is an example of a parallel For loop:

ParallelOptions options = new ParallelOptions();

                options.CancellationToken = ct;

 

                Parallel.For(0, Settings.Instance.Images.Count, options, i =>

                {

                    //Do Something

                });

            }, ct);

 

Parallel LINQ (PLINQ)

System.Linq was enhanced new classes, one of them is ParallelEnumerable. This class contains the added parallel implementation of LINQ and is super easy to use. Every LINQ  query you have can run in parallel by simply adding the keyword AsParallel in the query. You can also set WithDegreeOfParallelism that will signal the query how many cores to use.

var data = from myURL in URLs.AsParallel().WithDegreeOfParallelism(numConcurrentRetrievals)

           select (new WebClient()).DownloadData(myURL);

 Cancelling a parallel operation

To cancel any task or a parallel loop you can use a CancellationTokenSource that signals a CancellationToken a Cancellation has been requested. CancellationToken Is a part of ParallelOptions usage looks like:

CancellationTokenSource cSource = new CancellationTokenSource();

            CancellationToken ct = cSource.Token;

 

            var t = Task.Factory.StartNew(() =>

            {

                ParallelOptions options = new ParallelOptions();

                options.CancellationToken = ct;

 

                Parallel.For(0, Settings.Instance.Images.Count, options, i =>

                {

                    try

                    {

                        // Do something

                    }

                    catch (OperationCanceledException ex)

                    {

                       

                    }

                });

            }, ct);

            t.Wait();

 

You can also use WithCancellation() if you are trying to cancel a PLINQ operation.

When to parallelize?

That’s a good question. Not all threads need to become Tasks and not all LINQ queries need to be AsParallel. Sometimes the overhead of creating the underlying infrastructure can actually slow your code down. Try and parallelize (is that a word? ) when you execute operations that you already multithread and measure the execution time gained or lost. Parallelize when you clearly see one busy core along with idle cores. The most important thing is to test and measure using real life data and decide for yourself if certain segments of code can benefit from running using TPL or PLINQ.

To get you started, here is a little function I wrote in order to copy files between folders. The original function was large and clumsy and used the thread pool in order to multithread file copy.
The new function,  simply parallelize the copy operation.
Old:

public void SpeedCopyFolder(DirectoryInfo source, DirectoryInfo target, bool overrideExisting)

        {

            using (ManualResetEvent mre = new ManualResetEvent(false))

            {

                int threadCount = 0;

                foreach (FileInfo fi in source.GetFiles())

                {

                    Interlocked.Increment(ref threadCount);

                    // Create a thread for each file

                    FileInfo file = new FileInfo(fi.FullName); // Created for the delegate scope

                    ThreadPool.QueueUserWorkItem(delegate

                    {

                        try

                        {

                            if (File.Exists(Path.Combine(target.ToString(), file.Name)) && !overrideExisting)

                                return;

                            fi.CopyTo(Path.Combine(target.ToString(), file.Name), overrideExisting);

                        }

                        catch (Exception e)

                        {

                           

                        }

                        if (Interlocked.Decrement(ref threadCount) == 0) mre.Set();

                    });

                }

 

                if (Interlocked.Decrement(ref threadCount) == 0) mre.Set();

                mre.WaitOne();

            }

 

New:

public static void CopyFiles(string fromFolder, string toFolder)

        {

            Parallel.ForEach<string>(Directory.EnumerateFiles(fromFolder, "*"), f =>

            {

                File.Copy(f, toFolder + @"\" + Path.GetFileName(f), true);

            });

        }

 

That was a quick lap around the major components in the new parallel features of .NET 4.0. it is by no means a deep article and I am sure you will discover much more to it as you explore and start programming in Visual Studio 2010.


Enjoy and remember to always try and speed up your code!

Shout it

 


Posted 24 Apr 2010 6:59 AM by Gal Ratner
Filed under: , ,

Powered by Community Server (Non-Commercial Edition), by Telligent Systems