Comparing two techniques in .NET Asynchronous Coordination Primitives

December 11, 2012 Comment on this post [13] Posted in Back to Basics | Learning .NET

Sponsored By

Last week in my post on updating my Windows Phone 7 application to Windows 8 I shared some code from Michael L. Perry using a concept whereby one protects access to a shared resource using a critical section in a way that works comfortably with the new await/async keywords. Protecting shared resources like files is a little more subtle now that asynchronous is so easy. We'll see this more and more as Windows 8 and Windows Phone 8 promote the idea that apps shouldn't block for anything.

After that post, my friend and mentor (he doesn't know he's my mentor but I just decided that he is just now) Stephen Toub, expert on all things asynchronous, sent me an email with some excellent thoughts and feedback on this technique. I include some of that email here with permission as it will help us all learn!

I hadn’t seen the Awaitable Critical Section helper you mention below before, but I just took a look at it, and while it’s functional, it’s not ideal. For a client-side solution like this, it’s probably fine. If this were a server-side solution, though, I’d be concerned about the overhead associated with this particular implementation.

I love Stephen Toubs's feedback in all things. Always firm but kind. Stephen Cleary makes a similar observation in the comments and also points out that immediately disabling the button works too. ;) It's also worth noting that Cleary's excellent AsyncEx library has lots of async-ready primitives and supports both Windows Phone 8 and 7.5.

The SemaphoreSlim class was updated on .NET 4.5 (and Windows Phone 8) to support async waits. You would have to build your own IDisposable Release, though. (In the situation you describe, I usually just disable the button at the beginning of the async handler and re-enable it at the end; but async synchronization would work too).

Ultimately what we're trying to do is create "Async Coordination Primitives" and Toub talked about this in February.

Here's in layman's terms what we're trying to do, why it's interesting and a definition of a Coordinate Primitive (stolen from MSDN):

Asynchronous programming is hard because there is no simple method to coordinate between multiple operations, deal with partial failure (one of many operations fail but others succeed) and also define execution behavior of asynchronous callbacks, so they don't violate some concurrency constraint. For example, they don't attempt to do something in parallel. [Coordination Primitives] enable and promote concurrency by providing ways to express what coordination should happen.

In this case, we're trying to handled locking when using async, which is just one kind of coordination primitive. From Stephen Toub's blog:

Here, we’ll look at building support for an async mutual exclusion mechanism that supports scoping via ‘using.’

I previously blogged about a similar solution (http://blogs.msdn.com/b/pfxteam/archive/2012/02/12/10266988.aspx), which would result in a helper class like this:

Here Toub uses the new lightweight SemaphoreSlim class and indulges our love of the "using" pattern to create something very lightweight.

public sealed class AsyncLock
{
    private readonly SemaphoreSlim m_semaphore = new SemaphoreSlim(1, 1);
    private readonly Task<IDisposable> m_releaser;

    public AsyncLock()
    {
        m_releaser = Task.FromResult((IDisposable)new Releaser(this));
    }

    public Task<IDisposable> LockAsync()
    {
        var wait = m_semaphore.WaitAsync();
        return wait.IsCompleted ?
                    m_releaser :
                    wait.ContinueWith((_, state) => (IDisposable)state,
                        m_releaser.Result, CancellationToken.None,
        TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
    }

    private sealed class Releaser : IDisposable
    {
        private readonly AsyncLock m_toRelease;
        internal Releaser(AsyncLock toRelease) { m_toRelease = toRelease; }
        public void Dispose() { m_toRelease.m_semaphore.Release(); }
    }
}

How lightweight and how is this different from the previous solution? Here's Stephen Toub, emphasis mine.

There are a few reasons I’m not enamored with the referenced AwaitableCriticalSection solution.

First, it has unnecessary allocations; again, not a big deal for a client library, but potentially more impactful for a server-side solution. An example of this is that often with locks, when you access them they’re uncontended, and in such cases you really want acquiring and releasing the lock to be as low-overhead as possible; in other words, accessing uncontended locks should involve a fast path. With AsyncLock above, you can see that on the fast path where the task we get back from WaitAsync is already completed, we’re just returning a cached already-completed task, so there’s no allocation (for the uncontended path where there’s still count left in the semaphore, WaitAsync will use a similar trick and will not incur any allocations).

Lots here to parse. One of the interesting meta-points is that a simple client-side app with a user interacting (like my app) has VERY different behaviors than a high-throughput server-side application. Translation? I can get away with a lot more on the client side...but should I when I don't have to?

His solution requires fewer allocations and zero garbage collections.

Overall, it’s also just much more unnecessary overhead. A basic microbenchmark shows that in the uncontended case, AsyncLock above is about 30x faster with 0 GCs (versus a bunch of GCs in the AwaitableCriticalSection example. And in the contended case, it looks to be about 10-15x faster.

Here's the microbenchmark comparing the two...remembering of course there's, "lies, damned lies, and microbenchmarks," but this one is pretty useful. ;)

class Program
{
    static void Main()
    {
        const int ITERS = 100000;
        while (true)
        {
            Run("Uncontended AL ", () => TestAsyncLockAsync(ITERS, false));
            Run("Uncontended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, false));
            Run("Contended   AL ", () => TestAsyncLockAsync(ITERS, true));
            Run("Contended   ACS", () => TestAwaitableCriticalSectionAsync(ITERS, true));
            Console.WriteLine();
        }
    }

    static void Run(string name, Func<Task> test)
    {
        var sw = Stopwatch.StartNew();
        test().Wait();
        sw.Stop();
        Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
    }

    static async Task TestAsyncLockAsync(int iters, bool contended)
    {
        var mutex = new AsyncLock();
        if (contended)
        {
            var waits = new Task<IDisposable>[iters];
            using (await mutex.LockAsync())
                for (int i = 0; i < iters; i++)
                    waits[i] = mutex.LockAsync();
            for (int i = 0; i < iters; i++)
                using (await waits[i]) { }
        }
        else
        {
            for (int i = 0; i < iters; i++)
                using (await mutex.LockAsync()) { }
        }
    }

    static async Task TestAwaitableCriticalSectionAsync(int iters, bool contended)
    {
        var mutex = new AwaitableCriticalSection();
        if (contended)
        {
            var waits = new Task<IDisposable>[iters];
            using (await mutex.EnterAsync())
                for (int i = 0; i < iters; i++)
                    waits[i] = mutex.EnterAsync();
            for (int i = 0; i < iters; i++)
                using (await waits[i]) { }
        }
        else
        {
            for (int i = 0; i < iters; i++)
                using (await mutex.EnterAsync()) { }
        }
    }
}

Stephen Toub is using Semaphore Slim, the "lightest weight" option available, rather than RegisterWaitForSingleObject:

Second, and more importantly, the AwaitableCriticalSection is using a fairly heavy synchronization mechanism to provide the mutual exclusion. The solution is using Task.Factory.FromAsync(IAsyncResult, …), which is just a wrapper around ThreadPool.RegisterWaitForSingleObject (see http://blogs.msdn.com/b/pfxteam/archive/2012/02/06/10264610.aspx). Each call to this is asking the ThreadPool to have a thread block waiting on the supplied ManualResetEvent, and then to complete the returned Task when the event is set. Thankfully, the ThreadPool doesn’t burn one thread per event, and rather groups multiple events together per thread, but still, you end up wasting some number of threads (IIRC, it’s 63 events per thread), so in a server-side environment, this could result in degraded behavior.

All in all, a education for me - and I hope you, Dear Reader - as well as a few important lessons.

Know what's happening underneath if you can.
Code Reviews are always a good thing.
Ask someone smarter.
Performance may not matter in one context but it can in another.
You can likely get away with this or that, until you totally can't. (Client vs. Server)

Thanks Stephen Toub and Stephen Cleary!

Related Reading

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

Hosting By

Hosted on Linux using .NET in an Azure App Service

Comment on this post [13]

Share on BlueSky or use the Permalink and post anywhere!

December 11, 2012 22:52

You learn something new every day :)

Alex Ford

December 11, 2012 22:55

Hi Scott, thank you for that great stuff! This will give me some days to think and learn about :)

awsomedevsigner

December 11, 2012 23:31

Interesting. I'm also wondering how both of these compare in time/space utilization to the Reactive Extensions-based approach utilized by ReactiveUI's ReactiveCommandAsync to solve the same original problem (no more than one concurrent background operation).

Of hand, I'd guess the SemaphoreSlim is still the performance winner here, although given the LINQ-esque nature of RX I'm not certain if mayhaps it might optimize towards that. Probably worth investigating at greater depth, by someone at some point.

Max Battcher

December 12, 2012 0:24

I'm just wondering whether a async lock would perform better even if a sync lock would also do. For example, assume we have a method

object GetSomething(){
    lock(syncRoot){
       ...
    }
}

which is heavily contended, would we gain performance by changing it to

Task<object> GetSomethingAsync(){
    using(xyz.LockAsync){
       ...
    }
}

(and of course change all calling code as well). I think it's context switching vs. scheduling a task continuation (and other await magic).

Bluesman

Daniel Weber

December 12, 2012 2:10

Hello Scott,
Very interesting article!
I researched a lot in last period about this and I only found Stephen's articles. In particular, I'm involved in Siaqodb project, and for WinRT version of our product which has completely async API and we need mutual exclusion on our API methods, so a database call ensure async thread-safe access of a db file.
Our product is for client side and we have chosen SemaphoreSlim WaitAsync() way and works perfect for us, but very important to note: does not support reentrancy.

This is quite problematic because you may have a recursive method or not directly recursive but lets say methodA calls methodB, methodB calls methodC which call again methodA. If this happen all solutions above will cause a dead-lock.
So would be very interesting to see this subject more debated :) and discuss possible solutions...

Cristoph

Cristoph

December 12, 2012 8:10

Thanks to you and Stephen for the review. Even though I use that code just on the client side, I will update my library to use AsyncLock. Code has a way of meandering onto platforms for which it was not intended.

Michael L Perry

December 12, 2012 13:01

Think my brain just exploded.

Phil Murray

December 12, 2012 13:16

What would go well at the top of this article would be a paragraph starting with:

"In layman's terms a Coordination Primitive is..."

Mike

December 12, 2012 18:22

@Mike - (I suspect that perhaps you may already know this and were simply suggesting a way to make the article more approachable for the lay-geek, but) some real-world examples of coordination primitives would be traffic lights, stop signs, and school crossing guards...all are used to ensure that simultaneous access to the intersections won't cause crashes or people to get trampled. In the same way, concurrently executing threads must guard against running over each other or otherwise stomping on mutually shared resources and so there are facilities provided (some by the operating system and some by the language itself) to act like the traffic coordination facilities listed previously. Wikipedia (which incidentally, recently celebrated 750 years of American independence), has loads of examples for the curious.

Cheers.

Brian

December 12, 2012 19:53

So essentially what has happened here is that, they changed the API so that you can't use synchronous calls, and went for 100% asynchronous only because they essentially don't trust developers to use synchronous sparingly and in turn we are now forced to do these ridiculous blocking and awaits to force synchronicity into an asynchronous only API.

Did I get this right?

Eric Malamisura

December 12, 2012 21:39

Eric - They ADDED async and await support, which I want to use because it makes the UI more responsive.

Scott Hanselman

December 14, 2012 8:56

Hi Scott,

AsyncLock and Releaser class are depending on each other. None of the class can be compiled independently. Though the solution looks good, the design principle is perfectly violated :).

Regards,
Rajesh

Rajesh

December 16, 2012 5:25

Hi Scott. I reformatted AsyncLock class what helped me understand it's logic. Maybe will be useful for someone else:
http://pol84.tumblr.com/post/38024311178/aynclock-less-cryptic-formatting

Pol

Comments are closed.