Eyes wide open - Correct Caching is always hard
In my last post I talked about Caching and some of the stuff I've been doing to cache the results of a VERY expensive call to the backend that hosts my podcast.
As always, the comments are better than the post! Thanks to you, Dear Reader.
The code is below. Note that the MemoryCache is a singleton, but within the process. It is not (yet) a DistributedCache. Also note that Caching is Complex(tm) and that thousands of pages have been written about caching by smart people. This is a blog post as part of a series, so use your head and do your research. Don't take anyone's word for it.
Bill Kempf had an excellent comment on that post. Thanks Bill! He said:
The SemaphoreSlim is a bad idea. This "mutex" has visibility different from the state it's trying to protect. You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it's a bad practice.
As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.
My first reaction was, "bad idea?! Nonsense!" It took me a minute to parse his words and absorb. Ok, it took a few hours of background processing plus I had lunch.
Again, here's the code in question. I've removed logging for brevity. I'm also deeply not interested in your emotional investment in my brackets/braces style. It changes with my mood. ;)
public class ShowDatabase : IShowDatabase
{
private readonly IMemoryCache _cache;
private readonly ILogger _logger;
private SimpleCastClient _client;
public ShowDatabase(IMemoryCache memoryCache,
ILogger<ShowDatabase> logger,
SimpleCastClient client){
_client = client;
_logger = logger;
_cache = memoryCache;
}
static SemaphoreSlim semaphoreSlim = new SemaphoreSlim(1);
public async Task<List<Show>> GetShows() {
Func<Show, bool> whereClause = c => c.PublishedAt < DateTime.UtcNow;
var cacheKey = "showsList";
List<Show> shows = null;
//CHECK and BAIL - optimistic
if (_cache.TryGetValue(cacheKey, out shows))
{
return shows.Where(whereClause).ToList();
}
await semaphoreSlim.WaitAsync();
try
{
//RARE BUT NEEDED DOUBLE PARANOID CHECK - pessimistic
if (_cache.TryGetValue(cacheKey, out shows))
{
return shows.Where(whereClause).ToList();
}
shows = await _client.GetShows();
var cacheExpirationOptions = new MemoryCacheEntryOptions();
cacheExpirationOptions.AbsoluteExpiration = DateTime.Now.AddHours(4);
cacheExpirationOptions.Priority = CacheItemPriority.Normal;
_cache.Set(cacheKey, shows, cacheExpirationOptions);
return shows.Where(whereClause).ToList(); ;
}
catch (Exception e) {
throw;
}
finally {
semaphoreSlim.Release();
}
}
}
public interface IShowDatabase {
Task<List<Show>> GetShows();
}
SemaphoreSlim IS very useful. From the docs:
The System.Threading.Semaphore class represents a named (systemwide) or local semaphore. It is a thin wrapper around the Win32 semaphore object. Win32 semaphores are counting semaphores, which can be used to control access to a pool of resources.
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short. SemaphoreSlim relies as much as possible on synchronization primitives provided by the common language runtime (CLR). However, it also provides lazily initialized, kernel-based wait handles as necessary to support waiting on multiple semaphores. SemaphoreSlim also supports the use of cancellation tokens, but it does not support named semaphores or the use of a wait handle for synchronization.
And my use of a Semaphore here is correct...for some definitions of the word "correct." ;) Back to Bill's wise words:
You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it's a bad practice.
Ah! In this case, my cacheKey is "showsList" and I'm "protecting" it with a lock and double-check. That lock/check is fine and appropriate HOWEVER I have no guarantee (other than I wrote the whole app) that some other thread is also accessing the same IMemoryCache (remember, process-scoped singleton) at the same time. It's protected only within this function!
Here's where it gets even more interesting.
- I could make my own IMemoryCache, wrap things up, and then protect inside with my own TryGetValues...but then I'm back to checking/doublechecking etc.
- However, while I could lock/protect on a key...what about the semantics of other cached values that may depend on my key. There are none, but you could see a world where there are.
Yes, we are getting close to making our own implementation of Redis here, but bear with me. You have to know when to stop and say it's correct enough for this site or project BUT as Bill and the commenters point out, you also have to be Eyes Wide Open about the limitations and gotchas so they don't bite you as your app expands!
The suggestion was made to use the GetOrCreateAsync() extension method for MemoryCache. Bill and other commenters said:
As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.
Sadly, it doesn't work that way. There's no guarantee (via locking like I was doing) that the factory method (the thing that populates the cache) won't get called twice. That is, someone could TryGetValue, get nothing, and continue on, while another thread is already in line to call the factory again.
public static async Task<TItem> GetOrCreateAsync<TItem>(this IMemoryCache cache, object key, Func<ICacheEntry, Task<TItem>> factory)
{
if (!cache.TryGetValue(key, out object result))
{
var entry = cache.CreateEntry(key);
result = await factory(entry);
entry.SetValue(result);
// need to manually call dispose instead of having a using
// in case the factory passed in throws, in which case we
// do not want to add the entry to the cache
entry.Dispose();
}
return (TItem)result;
}
Is this the end of the world? Not at all. Again, what is your project's definition of correct? Computer science correct? Guaranteed to always work correct? Spec correct? Mostly works and doesn't crash all the time correct?
Do I want to:
- Actively and aggressively avoid making my expensive backend call at the risk of in fact having another part of the app make that call anyway?
- What I am doing with my cacheKey is clearly not a "best practice" although it works today.
- Accept that my backend call could happen twice in short succession and the last caller's thread would ultimately populate the cache.
- My code would become a dozen lines simpler, have no process-wide locking, but also work adequately. However, it would be naïve caching at best. Even ConcurrentDictionary has no guarantees - "it is always possible for one thread to retrieve a value, and another thread to immediately update the collection by giving the same key a new value."
What a fun discussion. What are your thoughts?
Sponsor: SparkPost’s cloud email APIs and C# library make it easy for you to add email messaging to your .NET applications and help ensure your messages reach your user’s inbox on time. Get a free developer account and a head start on your integration today!
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Tim - 7-10 seconds.
If you were strewing cache access throughout Controllers or some such ugliness, that would be a recipe for errors. But a "ShowDatabase" class which is the sole point where Show records are retrieved from the Database .. that seems fine.
At first glance, that GetOrCreateAsync extension method looks unambiguously worse for your purposes than what you have written. Your cache access is only protected from multiple data access calls within your one function. GetOrCreateAsync cache access is not protected at all.
I used this caching pattern to great success in a UWP app where it was critical to reduce the number of HTTP requests to keep the local device fast. If some data was displayed in two places in the UI simulataniously then we only wanted to fetch it from the server once.
Simpler caching, i.e. not getting the data inside a lock, is fine for a lot of people but if your website is high traffic and the data is expensive to get then every person who visits while it is uncached (either because your app is starting up or the cache expiried) will get their own copy.
1000 visitors/sec * 5 secs to get the data = 5000 requests to get the data.
On the other hand with a bit of locking - like you have now - you reduce that to 1 request.
I know that would then still impose parsing overheads but where is the current trade-off between the parsing and accessing the remote API?
Otherwise you could try two cache keys and populate the background one on a schedule before swapping it to be primary.
Just looking at the problem.
You are looking at doing double, triple, quadruple work when 2/3/4 people hit you site in the 7(?) second window that you have when cache expired and a new cache record is set up? After that the next 4 hours you have the episodes cached and happy days. After 4 hours you again run the "risk" of having multiple people hit that "cache expired building up the cache again" window.
So how big is the window, how long does GetShows take and how many concurrent users hit your site at the same time.
Biggest risk is probably something like automated downloads that podcast apps have (at least mine does) although that probably does not use the showlist but they could be hitting you often in a small window.
Is it a problem worth spending time on ;) https://xkcd.com/1205/
Thanks for the posts though, nice to see your thought processes out in the open :)
Since you asked the implementation is in CachingService.cs#L119 and proof it works is in CachingServiceTests.cs#L343
Whenever you start to introduce locking into web site software, I think you need to take a long hard look at the problem and decide if it is worth the trade off. Every lock you put into a site is essentially a pinch point for an inherently multi threaded environment.
For a single problem like the above, it probably won't show up as an issue, but the problem comes when (after reading this) someone decides that this is a great idea for *every* bit of static data that they load. Suddenly you are dealing with multiple pinch points and *shudder* the possibility of deadlocking your site.
If this is a serious slow point in the site, would you maybe be better served by introducing out-of-band caching and a dedicated cache server? This could be long lived and be primed as it starts rather than on demand.
Is this possibly overkill for your situation? Probably yes.
In that case, is the locking really necessary either? Probably not (in my opinion :)
I've done it manually myself, but the LazyCache library looks quite interesting. Thanks to everyone who recommended it.
AbsoluteExpiration = DateTime.Now
instead of UtcNow. Relatively more expensive and hopefully unnecessary. Will not work well a few hours a year.
Adrian - Yes, a huge improvement. The backend is so slow that SOME cache is absolutely needed.
Joe - Good point!
Alastair! Thanks for commenting and for the links. I'll check it out and see how it works with my code!
When it comes to structuring to "future proof" the code from developers, aye that's always difficult :) This is where you sometimes have to resort to supporting external documentation...
However, it's still not ideal, since it does force the cache to be local (Tasks don't serialize very well).
The most important prerequisite is "demand for them is high" - if this is not true, you approach with locking is more appropriate since holding always a copy won't be justified...
The ASP.NET Core "Middleware as Filters" feature uses ConcurrentDictionary combined with Lazy to ensure "run once" semantics. In the worst case, multiple Lazy<> objects are created for multiple threads, but only one of the objects succeeds in running the factory method. That looks to be the same approach as LazyCache.
I wrote a post on it a while ago as it was a new approach to me: https://andrewlock.net/making-getoradd-on-concurrentdictionary-thread-safe-using-lazy/
https://github.com/andyvans/AspectCache
[Cache(minutes:5)]
public virtual List<Show> GetShows()
{
...
}
Comments are closed.
If you did go for GetOrCreateAsync, could there be some value in priming it whenever you release a new episode so that most of the time it's actually you updating the cache?
I guess your current 4 hour expiration makes that not worthwhile, but you could replace that with an eviction strategy where it checked the shows list and only evicted the cache if it had actually changed. Very few requests would then be forced to wait for the shows list to download.