Comparing two techniques in .NET Asynchronous Coordination Primitives
Last week in my post on updating my Windows Phone 7 application to Windows 8 I shared some code from Michael L. Perry using a concept whereby one protects access to a shared resource using a critical section in a way that works comfortably with the new await/async keywords. Protecting shared resources like files is a little more subtle now that asynchronous is so easy. We'll see this more and more as Windows 8 and Windows Phone 8 promote the idea that apps shouldn't block for anything.
After that post, my friend and mentor (he doesn't know he's my mentor but I just decided that he is just now) Stephen Toub, expert on all things asynchronous, sent me an email with some excellent thoughts and feedback on this technique. I include some of that email here with permission as it will help us all learn!
I hadn’t seen the Awaitable Critical Section helper you mention below before, but I just took a look at it, and while it’s functional, it’s not ideal. For a client-side solution like this, it’s probably fine. If this were a server-side solution, though, I’d be concerned about the overhead associated with this particular implementation.
I love Stephen Toubs's feedback in all things. Always firm but kind. Stephen Cleary makes a similar observation in the comments and also points out that immediately disabling the button works too. ;) It's also worth noting that Cleary's excellent AsyncEx library has lots of async-ready primitives and supports both Windows Phone 8 and 7.5.
The SemaphoreSlim class was updated on .NET 4.5 (and Windows Phone 8) to support async waits. You would have to build your own IDisposable Release, though. (In the situation you describe, I usually just disable the button at the beginning of the async handler and re-enable it at the end; but async synchronization would work too).
Ultimately what we're trying to do is create "Async Coordination Primitives" and Toub talked about this in February.
Here's in layman's terms what we're trying to do, why it's interesting and a definition of a Coordinate Primitive (stolen from MSDN):
Asynchronous programming is hard because there is no simple method to coordinate between multiple operations, deal with partial failure (one of many operations fail but others succeed) and also define execution behavior of asynchronous callbacks, so they don't violate some concurrency constraint. For example, they don't attempt to do something in parallel. [Coordination Primitives] enable and promote concurrency by providing ways to express what coordination should happen.
In this case, we're trying to handled locking when using async, which is just one kind of coordination primitive. From Stephen Toub's blog:
Here, we’ll look at building support for an async mutual exclusion mechanism that supports scoping via ‘using.’
I previously blogged about a similar solution (http://blogs.msdn.com/b/pfxteam/archive/2012/02/12/10266988.aspx), which would result in a helper class like this:
Here Toub uses the new lightweight SemaphoreSlim class and indulges our love of the "using" pattern to create something very lightweight.
public sealed class AsyncLock
{
private readonly SemaphoreSlim m_semaphore = new SemaphoreSlim(1, 1);
private readonly Task<IDisposable> m_releaser;
public AsyncLock()
{
m_releaser = Task.FromResult((IDisposable)new Releaser(this));
}
public Task<IDisposable> LockAsync()
{
var wait = m_semaphore.WaitAsync();
return wait.IsCompleted ?
m_releaser :
wait.ContinueWith((_, state) => (IDisposable)state,
m_releaser.Result, CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}
private sealed class Releaser : IDisposable
{
private readonly AsyncLock m_toRelease;
internal Releaser(AsyncLock toRelease) { m_toRelease = toRelease; }
public void Dispose() { m_toRelease.m_semaphore.Release(); }
}
}
How lightweight and how is this different from the previous solution? Here's Stephen Toub, emphasis mine.
There are a few reasons I’m not enamored with the referenced AwaitableCriticalSection solution.
First, it has unnecessary allocations; again, not a big deal for a client library, but potentially more impactful for a server-side solution. An example of this is that often with locks, when you access them they’re uncontended, and in such cases you really want acquiring and releasing the lock to be as low-overhead as possible; in other words, accessing uncontended locks should involve a fast path. With AsyncLock above, you can see that on the fast path where the task we get back from WaitAsync is already completed, we’re just returning a cached already-completed task, so there’s no allocation (for the uncontended path where there’s still count left in the semaphore, WaitAsync will use a similar trick and will not incur any allocations).
Lots here to parse. One of the interesting meta-points is that a simple client-side app with a user interacting (like my app) has VERY different behaviors than a high-throughput server-side application. Translation? I can get away with a lot more on the client side...but should I when I don't have to?
His solution requires fewer allocations and zero garbage collections.
Overall, it’s also just much more unnecessary overhead. A basic microbenchmark shows that in the uncontended case, AsyncLock above is about 30x faster with 0 GCs (versus a bunch of GCs in the AwaitableCriticalSection example. And in the contended case, it looks to be about 10-15x faster.
Here's the microbenchmark comparing the two...remembering of course there's, "lies, damned lies, and microbenchmarks," but this one is pretty useful. ;)
class Program
{
static void Main()
{
const int ITERS = 100000;
while (true)
{
Run("Uncontended AL ", () => TestAsyncLockAsync(ITERS, false));
Run("Uncontended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, false));
Run("Contended AL ", () => TestAsyncLockAsync(ITERS, true));
Run("Contended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, true));
Console.WriteLine();
}
}
static void Run(string name, Func<Task> test)
{
var sw = Stopwatch.StartNew();
test().Wait();
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}
static async Task TestAsyncLockAsync(int iters, bool contended)
{
var mutex = new AsyncLock();
if (contended)
{
var waits = new Task<IDisposable>[iters];
using (await mutex.LockAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.LockAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.LockAsync()) { }
}
}
static async Task TestAwaitableCriticalSectionAsync(int iters, bool contended)
{
var mutex = new AwaitableCriticalSection();
if (contended)
{
var waits = new Task<IDisposable>[iters];
using (await mutex.EnterAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.EnterAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.EnterAsync()) { }
}
}
}
Stephen Toub is using Semaphore Slim, the "lightest weight" option available, rather than RegisterWaitForSingleObject:
Second, and more importantly, the AwaitableCriticalSection is using a fairly heavy synchronization mechanism to provide the mutual exclusion. The solution is using Task.Factory.FromAsync(IAsyncResult, …), which is just a wrapper around ThreadPool.RegisterWaitForSingleObject (see http://blogs.msdn.com/b/pfxteam/archive/2012/02/06/10264610.aspx). Each call to this is asking the ThreadPool to have a thread block waiting on the supplied ManualResetEvent, and then to complete the returned Task when the event is set. Thankfully, the ThreadPool doesn’t burn one thread per event, and rather groups multiple events together per thread, but still, you end up wasting some number of threads (IIRC, it’s 63 events per thread), so in a server-side environment, this could result in degraded behavior.
All in all, a education for me - and I hope you, Dear Reader - as well as a few important lessons.
- Know what's happening underneath if you can.
- Code Reviews are always a good thing.
- Ask someone smarter.
- Performance may not matter in one context but it can in another.
- You can likely get away with this or that, until you totally can't. (Client vs. Server)
Thanks Stephen Toub and Stephen Cleary!
Related Reading
- Building Async Coordination Primitives, Part 1: AsyncManualResetEvent
- Building Async Coordination Primitives, Part 2: AsyncAutoResetEvent
- Building Async Coordination Primitives, Part 3: AsyncCountdownEvent
- Building Async Coordination Primitives, Part 4: AsyncBarrier
- Building Async Coordination Primitives, Part 5: AsyncSemaphore
- Building Async Coordination Primitives, Part 6: AsyncLock
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Of hand, I'd guess the SemaphoreSlim is still the performance winner here, although given the LINQ-esque nature of RX I'm not certain if mayhaps it might optimize towards that. Probably worth investigating at greater depth, by someone at some point.
object GetSomething(){
lock(syncRoot){
...
}
}
which is heavily contended, would we gain performance by changing it to
Task<object> GetSomethingAsync(){
using(xyz.LockAsync){
...
}
}
(and of course change all calling code as well). I think it's context switching vs. scheduling a task continuation (and other await magic).
Bluesman
Very interesting article!
I researched a lot in last period about this and I only found Stephen's articles. In particular, I'm involved in Siaqodb project, and for WinRT version of our product which has completely async API and we need mutual exclusion on our API methods, so a database call ensure async thread-safe access of a db file.
Our product is for client side and we have chosen SemaphoreSlim WaitAsync() way and works perfect for us, but very important to note: does not support reentrancy.
This is quite problematic because you may have a recursive method or not directly recursive but lets say methodA calls methodB, methodB calls methodC which call again methodA. If this happen all solutions above will cause a dead-lock.
So would be very interesting to see this subject more debated :) and discuss possible solutions...
Cristoph
"In layman's terms a Coordination Primitive is..."
Cheers.
Did I get this right?
AsyncLock and Releaser class are depending on each other. None of the class can be compiled independently. Though the solution looks good, the design principle is perfectly violated :).
Regards,
Rajesh
http://pol84.tumblr.com/post/38024311178/aynclock-less-cryptic-formatting
Comments are closed.