In this article by Eugene Agafonov and Andrew Koryavchenko, the authors of the book, Mastering C# Concurrency, talks about Task Parallel Library in detail. Also, the C# language infrastructure that supports asynchronous calls have been explained.

The Task Parallel Library makes it possible to combine asynchronous tasks and set dependencies between them. To get a clear understanding, in this article, we will use this approach to solve a real problem—downloading images from Bing (the search engine). Also, we will do the following:

Implement standard synchronous approach
Use Task Parallel Library to create an asynchronous version of the program
Use C# 5.0 built-in asynchrony support to make the code easier to read and maintain
Simulate C# asynchronous infrastructure with the help of iterators
Learn about other useful features of Task Parallel Library
Make any C# type compatible with built-in asynchronous keywords

(For more resources related to this topic, see here.)

Implementing the downloading of images from Bing

Everyday Bing.com publishes its background image that can be used as desktop wallpaper. There is an XML API to get information about these pictures that can be found at http://www.bing.com/hpimagearchive.aspx.

Creating a simple synchronous solution

Let's try to write a program to download the last eight images from this site. We will start by defining objects to store image information. This is where a thumbnail image and its description will be stored:

using System.Drawing;
public class WallpaperInfo{
  private readonly Image _thumbnail;
  private readonly string _description;
  public WallpaperInfo(Image thumbnail, string description) {
    _thumbnail = thumbnail;
    _description = description;
  }

  public Image Thumbnail {
    get { return _thumbnail; }
  }

  public string Description {
    get { return _description; }
  }
}

The next container type is for all the downloaded pictures and the time required to download and make the thumbnail images from the original pictures:

public class WallpapersInfo {
  private readonly long _milliseconds;
  private readonly WallpaperInfo[] _wallpapers;

  public WallpapersInfo(long milliseconds, WallpaperInfo[] 
    wallpapers) {
    _milliseconds = milliseconds;
    _wallpapers = wallpapers;
  }

  public long Milliseconds {
    get { return _milliseconds; }
  }

  public WallpaperInfo[] Wallpapers {
    get { return _wallpapers; }
  }
}

Now we need to create a loader class to download images from Bing. We need to define a Loader static class and follow with an implementation. Let's create a method that will make a thumbnail image from the source image stream:

private static Image GetThumbnail(Stream imageStream) {
  using (imageStream) {
    var fullBitmap = Image.FromStream(imageStream);
    return new Bitmap(fullBitmap, 192, 108);
  }
}

To communicate via the HTTP protocol, it is recommended to use the System.Net.HttpClient type from the System.Net.dll assembly. Let's create the following extension methods that will allow us to use the POST HTTP method to download an image and get an opened stream:

private static Stream DownloadData(this HttpClient client, 
  string uri) {
  var response = client.PostAsync(
    uri, new StringContent(string.Empty)).Result;
  return response.Content.ReadAsStreamAsync().Result;
}

private static Task<Stream> DownloadDataAsync(this HttpClient 
  client, string uri) {
  Task<HttpResponseMessage> responseTask = client.PostAsync(
    uri, new StringContent(string.Empty));
  return responseTask.ContinueWith(task => 
    task.Result.Content.ReadAsStreamAsync()).Unwrap();
}

To create the easiest implementation possible, we will implement downloading without any asynchrony. Here, we will define HTTP endpoints for the Bing API:

private const string _catalogUri = 
  "http://www.bing.com/hpimagearchive.aspx?
    format=xml&idx=0&n=8&mbl=1&mkt=en-ww";
private const string _imageUri = 
  "http://bing.com{0}_1920x1080.jpg";

Then, we will start measuring the time required to finish downloading and download an XML catalog that has information about the images that we need:

var sw = Stopwatch.StartNew();
var client = new HttpClient();
var catalogXmlString = client.DownloadString(_catalogUri);

Next, the XML string will be parsed to an XML document:

var xDoc = XDocument.Parse(catalogXmlString);

Now using LINQ to XML, we will query the information needed from the document and run the download process for each image:

var wallpapers = xDoc
  .Root
  .Elements("image")
  .Select(e =>
    new {
      Desc = e.Element("copyright").Value,
      Url = e.Element("urlBase").Value
    })
  .Select(item =>
    new {
      item.Desc,
      FullImageData = client.DownloadData(
        string.Format(_imageUri, item.Url))
    })

  .Select( item =>
    new WallpaperInfo(
      GetThumbnail(item.FullImageData),
      item.Desc))
  .ToArray();

sw.Stop();

The first Select method call extracts image URL and description from each image XML element that is a direct child of root element. This information is contained inside the urlBase and copyright XML elements inside the image element. The second one downloads an image from the Bing site. The last Select method creates a thumbnail image and stores all the information needed inside the WallPaperInfo class instance.

To display the results, we need to create a user interface. Windows Forms is a simple and fast to implement technology, so we use it to show the results to the user. There is a button that runs the download, a panel to show the downloaded pictures, and a label that will show the time required to finish downloading.

Here is the implementation code. This includes a calculation of the top co-ordinate for each element, a code to display the images and start the download process:

private int GetItemTop(int height, int index) {
  return index * (height + 8) + 8;
}

private void RefreshContent(WallpapersInfo info) {
  _resultPanel.Controls.Clear();
  _resultPanel.Controls.AddRange(
    info.Wallpapers.SelectMany((wallpaper, i) => new Control[] {
    new PictureBox {
      Left = 4,
      Image = wallpaper.Thumbnail,
      AutoSize = true,
      Top = GetItemTop(wallpaper.Thumbnail.Height, i)
    },
    new Label {
      Left = wallpaper.Thumbnail.Width + 8,
      Top = GetItemTop(wallpaper.Thumbnail.Height, i),
      Text = wallpaper.Description,
      AutoSize = true
    }
  }).ToArray());
 
  _timeLabel.Text = string.Format(
"Time: {0}ms", info.Milliseconds);
}

private void _loadSyncBtn_Click(object sender, System.EventArgs e) {
  var info = Loader.SyncLoad();
  RefreshContent(info);
}

The result looks as follows:

c-language-support-asynchrony-img-0

So the time to download all these images should be about several seconds if the internet connection is broadband. Can we do this faster? We certainly can! Now we will download and process the images one by one, but we totally can process each image in parallel.

Creating a parallel solution with Task Parallel Library

The Task Parallel Library and the code that shows the relationships between tasks naturally splits into several stages as follows:

Load images catalog XML from Bing
Parsing the XML document and get the information needed about the images
Load each image's data from Bing
Create a thumbnail image for each image downloaded

The process can be visualized with the dependency chart:

c-language-support-asynchrony-img-1

HttpClient has naturally asynchronous API, so we only need to combine everything together with the help of a Task.ContinueWith method:

public static Task<WallpapersInfo> TaskLoad() {
  var sw = Stopwatch.StartNew();
  var downloadBingXmlTask = new HttpClient().GetStringAsync(_catalogUri);
  var parseXmlTask = downloadBingXmlTask.ContinueWith(task => {
    var xmlDocument = XDocument.Parse(task.Result);
    return xmlDocument.Root
      .Elements("image")
      .Select(e =>
        new {
          Description = e.Element("copyright").Value,
          Url = e.Element("urlBase").Value
        });
  });

  var downloadImagesTask = parseXmlTask.ContinueWith(
    task => Task.WhenAll(
      task.Result.Select(item => new HttpClient()
        .DownloadDataAsync(string.Format(_imageUri, item.Url))
        .ContinueWith(downloadTask => new WallpaperInfo(
          GetThumbnail(downloadTask.Result), item.Description)))))
        .Unwrap();

  return downloadImagesTask.ContinueWith(task => {
    sw.Stop();

    return new WallpapersInfo(sw.ElapsedMilliseconds, 
      task.Result);
  });
}

The code has some interesting moments. The first task is created by the HttpClient instance, and it completes when the download process succeeds. Now we will attach a subsequent task, which will use the XML string downloaded by the previous task, and then we will create an XML document from this string and extract the information needed.

Now this is becoming more complicated. We want to create a task to download each image and continue until all these tasks complete successfully. So we will use the LINQ Select method to run downloads for each image that was defined in the XML catalog, and after the download process completes, we will create a thumbnail image and store the information in the WallpaperInfo instance. This creates IEnumerable<Task<WallpaperInfo>> as a result, and to wait for all these tasks to complete, we will use the Task.WhenAll method. However, this is a task that is inside a continuation task, and the result is going to be of the Task<Task<WallpaperInfo[]>> type. To get the inner task, we will use the Unwrap method, which has the following syntax:

public static Task Unwrap(this Task<Task> task)

This can be used on any Task<Task> instance and will create a proxy task that represents an entire asynchronous operation properly.

The last task is to stop the timer and return the downloaded images and is quite straightforward. We have to add another button to the UI to run this implementation. Notice the implementation of the button click handler:

private void _loadTaskBtn_Click(object sender, System.EventArgs e) {
  var info = Loader.TaskLoad();
  info.ContinueWith(task => RefreshContent(task.Result),
    CancellationToken.None,
    TaskContinuationOptions.None,
    TaskScheduler.FromCurrentSynchronizationContext());
}

Since the TaskLoad method is asynchronous, it returns immediately. To display the results, we have to define a continuation task. The default task scheduler will run a task code on a thread pool worker thread. To work with UI controls, we have to run the code on the UI thread, and we use a task scheduler that captures the current synchronization context and runs the continuation task on this.

Let's name the button as Load using TPL and test the results. If your internet connection is fast, this implementation will download the images in parallel much faster compared to the previous sequential download process.

If we look back at the code, we will see that it is quite hard to understand what it actually does. We can see how one task depends on other, but the original goal is unclear despite the code being very compact and easy. Imagine what will happen if we would try to add exception handling here. We will have to append an additional continuation task with exception handling to each task. This will be much harder to read and understand. In a real-world program, it will be a challenging task to keep in mind these tasks composition and support a code written in such a paradigm.

Enhancing the code with C# 5.0 built-in support for asynchrony

Fortunately, C# 5.0 introduced the async and await keywords that are intended to make asynchronous code look like synchronous, and thus, makes reading of code and understanding the program flow easier. However, this is another abstraction and it hides many things that happen under the hood from the programmer, which in several situations is not a good thing. Now let's rewrite the previous code using new C# 5.0 features:

public static async Task<WallpapersInfo> AsyncLoad() {
  var sw = Stopwatch.StartNew();
  var client = new HttpClient();
  var catalogXmlString = await client.GetStringAsync(_catalogUri);
  var xDoc = XDocument.Parse(catalogXmlString);
  var wallpapersTask = xDoc
    .Root
    .Elements("image")
    .Select(e =>
      new {
        Description = e.Element("copyright").Value,
        Url = e.Element("urlBase").Value
      })
    .Select(async item =>
      new {
        item.Description,
        FullImageData = await client.DownloadDataAsync(
          string.Format(_imageUri, item.Url))
      });

  var wallpapersItems = await Task.WhenAll(wallpapersTask);
  var wallpapers = wallpapersItems.Select(
    item => new WallpaperInfo(
      GetThumbnail(item.FullImageData), item.Description));
  sw.Stop();

  return new WallpapersInfo(sw.ElapsedMilliseconds, 
    wallpapers.ToArray());
}

Now the code looks almost like the first synchronous implementation. The AsyncLoad method has a async modifier and a Task<T> return value, and such methods must always return Task or be declared as void—this is enforced by the compiler. However, in the method's code, the type that is returned is just T. This is strange at first, but the method's return value will be eventually turned into Task<T> by the C# 5.0 compiler. The async modifier is necessary to use await inside the method. In the further code, there is await inside a lambda expression, and we need to mark this lambda as async as well.

So what is going on when we use await inside our code? It does not always mean that the call is actually asynchronous. It can happen that by the time we call the method, the result is already available, so we just get the result and proceed further. However, the most common case is when we make an asynchronous call. In this case, we start. for example by downloading a XML string from Bing via HTTP and immediately return a task that is a continuation task and contains the rest of the code after the line with await.

To run this, we need to add another button named Load using async. We are going to use await in the button click event handler as well, so we need to mark it with the async modifier:

private async void _loadAsyncBtn_Click(object sender, System.EventArgs e) {
  var info = await Loader.AsyncLoad();
  RefreshContent(info);
}

Now if the code after await is being run in a continuation task, why is there no multithreaded access exception? The RefreshContent method runs in another task, but the C# compiler is aware of the synchronization context and generates a code that executes the continuation task on the UI thread. The result should be as fast as a TPL implementation but the code is much cleaner and easy to follow.

The last but not least, is possibility to put asynchronous method calls inside a try block. The C# compiler generates a code that will propagate the exception into the current context and unwrap the AggregateException instance to get the original exception from it.

In C# 5.0, it was impossible to use await inside catch and finally blocks, but C# 6.0 introduced a new async/await infrastructure and this limitation was removed.

Simulating C# asynchronous infrastructure with iterators

To dig into the implementation details, it makes sense to look at the decompiled code of the AsyncLoad method:

public static Task<WallpapersInfo> AsyncLoad() {
  Loader.<AsyncLoad>d__21 stateMachine;
  stateMachine.<>t__builder = 
    AsyncTaskMethodBuilder<WallpapersInfo>.Create();
  stateMachine.<>1__state = -1;
  stateMachine
    .<>t__builder
    .Start<Loader.<AsyncLoad>d__21>(ref stateMachine);
  return stateMachine.<>t__builder.Task;
}

The method body was replaced by a compiler-generated code that creates a special kind of state machine. We will not review the further implementation details here, because it is quite complicated and is subject to change from version to version. However, what's going on is that the code gets divided into separate pieces at each line where await is present, and each piece becomes a separate state in the generated state machine. Then, a special System.Runtime.CompilerServices.AsyncTaskMethodBuilder structure creates Task that represents the generated state machine workflow.

This state machine is quite similar to the one that is generated for the iterator methods that leverage the yield keyword. In C# 6.0, the same universal code gets generated for the code containing yield and await. To illustrate the general principles behind the generated code, we can use iterator methods to implement another version of asynchronous images download from Bing.

Therefore, we can turn an asynchronous method into an iterator method that returns the IEnumerable<Task> instance. We replace each await with yield return making each iteration to be returned as Task. To run such a method, we need to execute each task and return the final result. This code can be considered as an analogue of AsyncTaskMethodBuilder:

private static Task<TResult> ExecuteIterator<TResult>(
  Func<Action<TResult>,IEnumerable<Task>> iteratorGetter) {
  return Task.Run(() => {
    var result = default(TResult);
    foreach (var task in iteratorGetter(res => result = res))
      task.Wait();
    return result;
  });
}

We iterate through each task and await its completion. Since we cannot use the out and ref parameters in iterator methods, we use a lambda expression to return the result from each task. To make the code easier to understand, we have created a new container task and used the foreach loop; however, to be closer to the original implementation, we should get the first task and use the ContinueWith method providing the next task to it and continue until the last task. In this case, we will end up having one final task representing an entire sequence of asynchronous operations, but the code will become more complicated as well.

Since it is not possible to use the yield keyword inside a lambda expressions in the current C# versions, we will implement image download and thumbnail generation as a separate method:

private static IEnumerable<Task> GetImageIterator(
  string url,
  string desc,
  Action<WallpaperInfo> resultSetter) {
  var loadTask = new HttpClient().DownloadDataAsync(
    string.Format(_imageUri, url));
  yield return loadTask;
  var thumbTask = Task.FromResult(GetThumbnail(loadTask.Result));
  yield return thumbTask;
  resultSetter(new WallpaperInfo(thumbTask.Result, desc));
}

It looks like a common C# async code with yield return used instead of the await keyword and resultSetter used instead of return. Notice the Task.FromResult method that we used to get Task from the synchronous GetThumbnail method. We can use Task.Run and put this operation on a separate worker thread, but it will be an ineffective solution; Task.FromResult allows us to get Task that is already completed and has a result. If you use await with such task, it will be translated into a synchronous call.

The main code can be rewritten in the same way:

private static IEnumerable<Task> GetWallpapersIterator(
  Action<WallpaperInfo[]> resultSetter) {
  var catalogTask = new HttpClient().GetStringAsync(_catalogUri);
  yield return catalogTask;
  var xDoc = XDocument.Parse(catalogTask.Result);
  var imagesTask = Task.WhenAll(xDoc
    .Root
    .Elements("image")
    .Select(e => new {
      Description = e.Element("copyright").Value, 
        Url = e.Element("urlBase").Value
    })
    .Select(item => ExecuteIterator<WallpaperInfo>(
      resSetter => GetImageIterator(
        item.Url, item.Description, resSetter))));
  yield return imagesTask;
  resultSetter(imagesTask.Result);
}

This combines everything together:

public static WallpapersInfo IteratorLoad() {
  var sw = Stopwatch.StartNew();
  var wallpapers = ExecuteIterator<WallpaperInfo[]>(
    GetWallpapersIterator)
      .Result;
  sw.Stop();
  return new WallpapersInfo(sw.ElapsedMilliseconds, wallpapers);
}

To run this, we will create one more button called Load using iterator. The button click handler just runs the IteratorLoad method and then refreshes the UI. This also works with about the same speed as other asynchronous implementations.

This example can help us to understand the logic behind the C# code generation for asynchronous methods used with await. Of course, the real code is much more complicated, but the principles behind it remain the same.

Is the async keyword really needed?

It is a common question about why do we need to mark methods as async? We have already mentioned iterator methods in C# and the yield keyword. This is very similar to async/await, and yet we do not need to mark iterator methods with any modifier. The C# compiler is able to determine that it is an iterator method when it meets the yield return or yield break operators inside such a method. So the question is, why is it not the same with await and the asynchronous methods?

The reason is that asynchrony support was introduced in the latest C# version, and it is very important not to break any legacy code while changing the language. Imagine if any code used await as a name for a field or variable. If C# developers make await a keyword without any conditions, this old code will break and stop compiling. The current approach guarantees that if we do not mark a method with async, the old code will continue to work.

Fire-and-forget tasks

Besides Task and Task<T>, we can declare an asynchronous method as void. It is useful in the case of top-level event handlers, for example, the button click or text changed handlers in the UI. An event handler that returns a value is possible, but is very inconvenient to use and does not make much sense.

So allowing async void methods makes it possible to use await inside such event handlers:

private async void button1_Click(object sender, EventArgs e) {
  await SomeAsyncStuff();
}

It seems that nothing bad is happening, and the C# compiler generates almost the same code as for the Task returning method, but there is an important catch related to exceptions handling.

When an asynchronous method returns Task, exceptions are connected to this task and can be handled both by TPL and the try/catch block in case await is used. However, if we have a async void method, we have no Task to attach the exceptions to and those exceptions just get posted to the current synchronization context. These exceptions can be observed using AppDomain.UnhandledException or similar events in a GUI application, but this is very easy to miss and not a good practice.

The other problem is that we cannot use a void returning asynchronous method with await, since there is no return value that can be used to await on it. We cannot compose such a method with other asynchronous tasks and participate in the program workflow. It is basically a fire-and-forget operation that we start, and then we have no way to control how it will proceed (if we did not write the code for this explicitly).

Another problem is void returning async lambda expression. It is very hard to notice that lambda returns void, and all problems related to usual methods are related to lambda expression as well. Imagine that we want to run some operation in parallel. To achieve this, we can use the Parallel.ForEach method. To download some news in parallel, we can write a code like this:

Parallel.ForEach(Enumerable.Range(1,10), async i => {
  var news = await newsClient.GetTopNews(i);
  newsCollection.Add(news);
});

However, this will not work, because the second parameter of the ForEach method is Action<T>, which is a void returning delegate. Thus, we will spawn 10 download processes, but since we cannot wait for completion, we abandon all asynchronous operations that we just started and ignore the results.

A general rule of thumb is to avoid using async void methods. If this is inevitable and there is an event handler, then always wrap the inner await method calls in try/catch blocks and provide exception handling.

Other useful TPL features

Task Parallel Library has a large codebase and some useful features such as Task.Unwrap or Task.FromResult that are not very well known to developers. We have still not mentioned two more extremely useful methods yet. They are covered in the following sections.

Task.Delay

Often, it is required to wait for a certain amount of time in the code. One of the traditional ways to wait is using the Thread.Sleep method. The problem is that Thread.Sleep blocks the current thread, and it is not asynchronous.

Another disadvantage is that we cannot cancel waiting if something has happened. To implement a solution for this, we will have to use system synchronization primitives such as an event, but this is not very easy to code. To keep the code simple, we can use the Task.Delay method:

// Do something
await Task.Delay(1000);
// Do something

This method can be canceled with a help of the CancellationToken infrastructure and uses system timer under the hood, so this kind of waiting is truly asynchronous.

Task.Yield

Sometimes we need a part of the code to be guaranteed to run asynchronously. For example, we need to keep the UI responsive, or maybe we would like to implement a fine-grained scenario. Anyway, as we already know that using await does not mean that the call will be asynchronous. If we want to return control immediately and run the rest of the code as a continuation task, we can use the Task.Yield method:

// Do something
await Task.Yield();
// Do something

Task.Yield just causes a continuation to be posted on the current synchronization context, or if the synchronization context is not available, a continuation will be posted on a thread pool worker thread.

Implementing a custom awaitable type

Until now we have only used Task with the await operator. However, it is not the only type that is compatible with await. Actually, the await operator can be used with every type that contains the GetAwaiter method with no parameters and the return type that does the following:

Implements the INotifyCompletion interface
Contains the IsCompleted boolean property
Has the GetResult method with no parameters

This method can even be an extension method, so it is possible to extend the existing types and add the await compatibility to them. In this example, we will create such a method for the Uri type. This method will download content as a string via HTTP from the address provided in the Uri instance:

private static TaskAwaiter<string> GetAwaiter(this Uri url) {
  return new HttpClient().GetStringAsync(url).GetAwaiter();
}
var content = await new Uri("http://google.com");
Console.WriteLine(content.Substring(0, 10));

If we run this, we will see the first 10 characters of the Google website content.

As you may notice, here we used the Task type indirectly, returning the already provided awaiter method for the Task type. We can implement an awaiter method manually from scratch, but it really does not make any sense. To understand how this works it will be enough to create a custom wrapper around an already existing TaskAwaiter:

struct DownloadAwaiter : INotifyCompletion {
  private readonly TaskAwaiter<string> _awaiter;
  public DownloadAwaiter(Uri uri) {
    Console.WriteLine("Start downloading from {0}", uri);
    var task = new HttpClient().GetStringAsync(uri);
    _awaiter = task.GetAwaiter();
    Task.GetAwaiter().OnCompleted(() => Console.WriteLine(
      "download completed"));
  }

  public bool IsCompleted {
    get { return _awaiter.IsCompleted; }
  }

  public void OnCompleted(Action continuation) {
    _awaiter.OnCompleted(continuation);
  }

  public string GetResult() {
    return _awaiter.GetResult();
  }
}

With this code, we have customized asynchronous execution that provides diagnostic information to the console. To get rid of TaskAwaiter, it will be enough to change the OnCompleted method with custom code that will execute some operation and then a continuation provided in this method.

To use this custom awaiter, we need to change GetAwaiter accordingly:

private static DownloadAwaiter GetAwaiter(this Uri uri) {
  return new DownloadAwaiter(uri);
}

If we run this, we will see additional information on the console. This can be useful for diagnostics and debugging.

Summary

In this article, we have looked at the C# language infrastructure that supports asynchronous calls. We have covered the new C# keywords, async and await, and how we can use Task Parallel Library with the new C# syntax. We have learned how C# generates code and creates a state machine that represents an asynchronous operation, and we implemented an analogue solution with the help of iterator methods and the yield keyword. Besides this, we have studied additional Task Parallel Library features and looked at how we can use await with any custom type.