Scott Hanselman

Proper benchmarking to diagnose and solve a .NET serialization bottleneck

February 20, 2015 Comment on this post [25] Posted in ASP.NET | Learning .NET | Open Source
Sponsored By

From http://adrianotto.com/2010/08/dev-null-unlimited-scale/Here's a few comments and disclaimers to start with. First, benchmarks are challenging. They are challenging to measure, but the real issue is that often we forget WHY we are benchmarking something. We'll take a complex multi-machine financial system and suddenly we're hyper-focused on a bunch of serialization code that we're convinced is THE problem. "If I can fix this serialization by writing a 10,000 iteration for loop and getting it down to x milliseconds, it'll be SMOOOOOOTH sailing."

Second, this isn't a benchmarking blog post. Don't point this blog post and say "see! Library X is better than library Y! And .NET is better than Java!" Instead, consider this a cautionary tale, and a series of general guidelines. I'm just using this anecdote to illustrate these points.

  • Are you 100% sure what you're measuring?
  • Have you run a profiler like the Visual Studio profiler, ANTS, or dotTrace?
  • Are you considering warm-up time? Throwing out outliers? Are your results statistically significant?
  • Are the libraries you're using optimized for your use case? Are you sure what your use case is?

A bad benchmark

A reader sent me a email recently with concern of serialization in .NET. They had read some very old blog posts from 2009 about perf that included charts and graphs and did some tests of their own. They were seeing serialization times (of tens of thousands of items) over 700ms and sizes nearly 2 megs. The tests included serialization of their typical data structures in both C# and Java across a number of different serialization libraries and techniques. Techniques included their company's custom serialization, .NET binary DataContract serialization, as well as JSON.NET. One serialization format was small (1.8Ms for a large structure) and one was fast (94ms) but there was no clear winner. This reader was at their wit's end and had decided, more or less, that .NET must not be up for the task.

To me, this benchmark didn't smell right. It wasn't clear what was being measured. It wasn't clear if it was being accurately measured, but more specifically, the overarching conclusion of ".NET is slow" wasn't reasonable given the data.

Hm. So .NET can't serialize a few tens of thousands of data items quickly? I know it can.

Related Links: Create benchmarks and results that have value and Responsible benchmarking by @Kellabyte

I am no expert, but I poked around at this code.

First: Are we measuring correctly?

The tests were using DateTime.UtcNow which isn't advisable.

startTime = DateTime.UtcNow;
resultData = TestSerialization(foo);
endTime = DateTime.UtcNow;

Do not use DateTime.Now or DateTime.Utc for measuring things where any kind of precision matters. DateTime doesn't have enough precision and is said to be accurate only to 30ms.

DateTime represents a date and a time. It's not a high-precision timer or Stopwatch.

As Eric Lippert says:

In short, "what time is it?" and "how long did that take?" are completely different questions; don't use a tool designed to answer one question to answer the other.

And as Raymond Chen says:

"Precision is not the same as accuracy. Accuracy is how close you are to the correct answer; precision is how much resolution you have for that answer."

So, we will use a Stopwatch when you need a stopwatch. In fact, before I switch the sample to Stopwatch I was getting numbers in milliseconds like 90,106,103,165,94, and after Stopwatch the results were 99,94,95,95,94. There's much less jitter.

Stopwatch sw = new Stopwatch();
sw.Start();

// stuff

sw.Stop();

You might also want to pin your process to a single CPU core if you're trying to get an accurate throughput measurement. While it shouldn't matter and Stopwatch is using the Win32 QueryPerformanceCounter internally (the source for the .NET Stopwatch Class is here) there were some issues on old systems when you'd start on one proc and stop on another.

// One Core
var p = Process.GetCurrentProcess();
p.ProcessorAffinity = (IntPtr)1;

If you don't use Stopwatch, look for a  simple and well-thought-of benchmarking library.

Second: Doing the math

In the code sample I was given, about 10 lines of code were the thing being measured, and 735 lines were the "harness" to collect and display the data from the benchmark. Perhaps you've seen things like this before? It's fair to say that the benchmark can get lost in the harness.

Have a listen to my recent podcast with Matt Warren on "Performance as a Feature" and consider Matt's performance blog and be sure to pick up Ben Watson's recent Book called "Writing High Performance .NET Code".

Also note that Matt is currently exploring creating a mini-benchmark harness on GitHub. Matt's system is rather promising and would have a [Benchmark] attribute within a test.

Considering using an existing harness for small benchmarks. One is SimpleSpeedTester from Yan Cui. It makes nice tables and does a lot of the tedious work for you. Here's a screenshot I stole borrowed from Yan's blog.

image11

Something a bit more advanced to explore is HdrHistogram, a library "designed for recoding histograms of value measurements in latency and performance sensitive applications." It's also on GitHub and includes Java, C, and C# implementations.

PercentileHistogramExample

And seriously. Use a profiler.

Third: Have you run a profiler?

Use the Visual Studio Profiler, or get a trial of the Redgate ANTS Performance Profiler or the JetBrains dotTrace profiler.

Where is our application spending its time? Surprise I think we've all seen people write complex benchmarks and poke at a black box rather than simply running a profiler.

Visual Studio Profiler

Aside: Are there newer/better/understood ways to solve this?

This is my opinion, but I think it's a decent one and there's numbers to back it up. Some of the .NET serialization code is pretty old, written in 2003 or 2005 and may not be taking advantage of new techniques or knowledge. Plus, it's rather flexible "make it work for everyone" code, as opposed to very narrowly purposed code.

People have different serialization needs. You can't serialize something as XML and expect it to be small and tight. You likely can't serialize a structure as JSON and expect it to be as fast as a packed binary serializer.

Measure your code, consider your requirements, and step back and consider all options.

Fourth: Newer .NET Serializers to Consider

Now that I have a sense of what's happening and how to measure the timing, it was clear these serializers didn't meet this reader's goals. Some of are old, as I mentioned, so what other newer more sophisticated options exist?

There's two really nice specialized serializers to watch. They are Jil from Kevin Montrose, and protobuf-net from Marc Gravell. Both are extraordinary libraries, and protobuf-net's breadth of target framework scope and build system are a joy to behold. There are also other impressive serializers in including support for not only JSON, but also JSV and CSV in ServiceStack.NET.

Protobuf-net - protocol buffers for .NET

Protocol buffers are a data structure format from Google, and protobuf-net is a high performance .NET implementation of protocol buffers. Think if it like XML but smaller and faster. It also can serialize cross language. From their site:

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

It was easy to add. There's lots of options and ways to decorate your data structures but in essence:

var r = ProtoBuf.Serializer.Deserialize<List<DataItem>>(memInStream);

The numbers I got with protobuf-net were exceptional and in this case packed the data tightly and quickly, taking just 49ms.

JIL - Json Serializer for .NET using Sigil

Jil s a Json serializer that is less flexible than Json.net and makes those small sacrifices in the name of raw speed. From their site:

Flexibility and "nice to have" features are explicitly discounted in the pursuit of speed.

It's also worth pointing out that some serializers work over the whole string in memory, while others like Json.NET and DataContractSerializer work over a stream. That means you'll want to consider the size of what you're serializing when choosing a library.

Jil is impressive in a number of ways but particularly in that it dynamically emits a custom serializer (much like the XmlSerializers of old)

Jil is trivial to use. It just worked. I plugged it in to this sample and it took my basic serialization times to 84ms.

result = Jil.JSON.Deserialize<Foo>(jsonData);

Conclusion: There's the thing about benchmarks. It depends.

What are you measuring? Why are you measuring it? Does the technique you're using handle your use case? Are you serializing one large object or thousands of small ones?

James Newton-King made this excellent point to me:

"[There's a] meta-problem around benchmarking. Micro-optimization and caring about performance when it doesn't matter is something devs are guilty of. Documentation, developer productivity, and flexibility are more important than a 100th of a millisecond."

In fact, James pointed out this old (but recently fixed) ASP.NET bug on Twitter. It's a performance bug that is significant, but was totally overshadowed by the time spent on the network.

Thanks to Marc Gravell and James Newton-King for their time helping with this post.

What are your benchmarking tips and tricks? Sound off in the comments!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service
February 20, 2015 5:58
awesome!
February 20, 2015 6:04
Good guidance. Matches my approach and beliefs. Thanks.

Greg
February 20, 2015 6:48
Simple Binary Encoding (SBE) is another fast serialization library. It it targeted to systems requiring high transaction rates, ie tens or hundreds of thousands of message per second (or higher), typically financial or high frequency trading.

Benchmarks shown on the following pages shows ~16-25 times faster serialization compared to protobuf on .NET. One design point of SBE is that it does not allocate memory!

"A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again." -- Martin Thompson

http://weareadaptive.com/blog/2013/12/10/sbe-1/

http://mechanical-sympathy.blogspot.ca/2014/05/simple-binary-encoding.html

https://github.com/real-logic/simple-binary-encoding
February 20, 2015 7:13
Great post - I love getting proper benchmark info and caveats out there.

Just for completeness, it's worth noting that Jil also has a TextReader & TextWriter overloads for stream serialization/deserialization as well.

We're a bit fanatical about allocations over here at Stack - unless there's a good reason to allocate the entire string (e.g. debugging) we generally avoid it and prefer streams at all times.
February 20, 2015 8:36
The thing I hate about Microsoft's serializers isn't only the performance, but the fact that any version change in the DLL breaks it in a way that it can't recover from.
February 20, 2015 8:42
Great post and great quotes
February 20, 2015 9:16
Thanks Nick, I'll update that part!
February 20, 2015 12:09
Very interesting post.

This an area that has caused us a lot of pain over the years so we developed a solution we hope can provide some ideas for someone to make something even better.

The main goal for us is to treat performance tests as important as unit tests when writing code, and they should also be as easy to write as unit tests.

These are some points that are important to us:
- The time complexity of a piece of code needs to be measured over different data sizes.
- Write performance tests against an interface so that the same performance tests can be re-used against multiple implementations.
- Benchmark multiple implementations altogether and display the results in real time.
- Collaborate on some standard performance fixtures that will be widely accepted and used by the community. (Which I hope will also lead to better interface designs and acceptance.)
- Take into account Garbage collection.

With these points in mind we started working on Nperf (https://github.com/Orcomp/NPerf) and NPerfFixtures (https://github.com/Orcomp/NPerf.Fixtures).

We even created a ISerializer fixture, and compared some serialization implementations. (https://github.com/Orcomp/NPerf.Fixtures/blob/master/docs/ISerializer/ISerializerDocs.md)

We then moved to NUnitBenchmarker because we really wanted performance tests to be as easy to write as unit tests, so we basically leveraged all the features that come with NUnit.
(i.e. if you can write Nunit tests, you can write performance tests.)

These are the links

Github: https://github.com/Orcomp/NUnitBenchmarker
Nuget: https://www.nuget.org/packages/NUnitBenchmarker.Benchmark/1.1.0-unstable0005

NUnitBenchmarker is already useful but there is still a lot more to do. (It uses SimpleSpeedTester internally.)

PerfUtil (https://github.com/nessos/PerfUtil) has also caught our attention because it will also report on GC gen0, GC gen1 and GC gen2 times.

I hope Scott's post will create some constructive discussions that can focus the community in coming up with a good solution.
Ben
February 20, 2015 14:34
Great post, its fantastic to see perf and benchmarking being discussed.

Just one thing, I didn't write the .net performance book, the author is Ben Watson. I just recommended it because it's brilliant!
February 20, 2015 15:06
A format that can also be considered

http://msgpack.org/
February 20, 2015 17:46
Great post! I am curious about your thoughts on Bond (https://github.com/Microsoft/bond/).
February 20, 2015 18:49
One downside of protocol buffers is the payload is difficult to view using debugging tools. If you need serialize binary objects across platforms, it's a good choice, though. But I would prefer JSON/XML if it were an option.
February 20, 2015 23:08
@Kerry Jenkins I was just going to say no Microsoft Bond?
February 21, 2015 1:51
My biggest advice regarding benchmarks is "don't", or more accurately "wait".

As a perfectionist (and most great programmers are), I have a tendency like the above mentioned process to nitpick every single little thing while I'm profiling and benchmarking. And I have found that far too often my code isn't being used by enough people, or is fast enough already that benchmarking is actually a great waste of mine and others time.

I only ever break out the benchmarking when I've actually come to the conclusion that the item that has been written is actually too slow and needs to be made faster.

Far too often I write stuff that elements outside of the users control (latency, HDD speeds, etc) are actually the limiting factors of performance, and I have found that waiting until later allows me to be more productive.

On the other hand in performance critical areas, the above advice is totally crummy and you need to benchmark the heck out of it. There are a few areas in my latest project that we found out our performance targets were actually the reverse of the typical target. Our software was heavily predicated to improving second run performance (as are many), but due to the amount of development and testing going on those areas were actually costing major development time as we were pre-rendering activities that weren't being used in that particular developers tests. Anyways we found that we needed to have conditional compile options to be able to actually disable some of the performance optimizations in order to optimize the least likely option (first run).
February 21, 2015 2:04
A few benchmarking pro-tips:


  1. Benchmark in Release mode and no debugger attached.

  2. Unless you care about startup time, warm up the code you want to benchmark before you start the timer. Loading types, JITing, instantiating static variables, and so on takes time.

  3. If you're using a profiler like dotTrace, be aware that turning on features like line by line performance analysis will distort the results. High level sampling is better to figure out where time is being spent.



~ James
February 21, 2015 6:50
Another serialization option: Cap'n Proto (.Net Bindings)
February 23, 2015 3:07
Although I love Jil, it's important (always a good advice regardless) to have thorough unit tests in place. Mine showed regression bugs when I switched from Newtonsoft to Jil. Apparently, Jil fails serializing dictionaries that hold 'object's as their keys' values.
February 23, 2015 21:19
Performance is easy in the lab but unfortunately issues tend to crop up at customer sites.
It is important to measure perf and do regression testing but you will always need be able to do production profiling.
For such hard to find bugs nothing beats xperf since you can record system wide everything. If you do not want 100 GB of files you can record into a ring buffer and stop profiling when something did happen. That way you can record the interesting stuff without having too much data.
You can even do something like debugging with a profiler (profbugging?).

E.g. if you find LOH (Large Object Heap) allocations of object[] which comes from Delegate.Remove I know that I have at least one event handler which has more than 10625 subscribers. Why? Because the object array was allocated on the LOH heap and must be bigger than 85000 bytes. On x64 the reference size is 8 where I can store in the delegate invocation list at least 85000/8=10625 references.
This tells me that apart from many allocations we also have leaked event handler possible due to previous errors.

For the fastest serializer on the planet you need to be GC allocation free where already solutions exist as Phil Bolduc has pointed out. Unfortunately the .NET serializer do not support allocation free programing at all. Some caching and buffer reuse would be cool. More APIs should accept ArraySegment<byte> instead of plain byte[] arrays and a way to get notified when the buffer is no longer used by the serializer to pool it again.
February 24, 2015 2:08
I wrote a Benchmark tool called Benchy, meant to enable Benchmark testing to be done in an XUnit style.

https://github.com/cmbrown1598/Benchy

It allows you to setup tests and mark tests as failing if your test fails to meet a particular benchmark, and it comes with a basic runner.

Give it a looksee!

Christopher
February 24, 2015 12:50
Seriously, don't repeat those false claims about QueryPerformanceCounter issues when other people at Microsoft try hard to clear those up.


http://blogs.msdn.com/b/oldnewthing/archive/2014/08/22/10551965.aspx

https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx
February 24, 2015 14:53
If you want/need simple binary/XML serialization I can also recommend SharpSerializer. Not quite sure it works on the Xamarin/Mono stack, but it does work on every platform officially supported by MS (Xbox, Windows Phone, Silverlight, Windows RT and Desktop), it's open source, it's available from NuGet, it's a shit ton more faster than BinaryFormatter and it's POCO friendly which means you don't have to mark every object you want to serialize over the network or to disk as Serializable, just like with serializers such as Newtonsoft Json and friends. If you want a nice replacement for BinaryFormatter you really should take a look at it :)
February 27, 2015 16:36
This test was posted yesterday. Seems there are more contenders in town :)

Jil and ProtoBuf seem to go head to head, but then Netserializer comes along.... Wowza.
February 27, 2015 20:43
One missing piece in this post highly related to HdrHistogram is something called Coordinated Omission, which is a term coined by Gil Tene (the guy behind HdrHistogram.org). Basically, our benchmarking tools are masking or hiding the true latency problems.

Once you know to look for Coordinated Omission, you see it EVERYWHERE.
March 01, 2015 12:31
you forget to compare following serializer, similar or even better performance than protobuf-net
http://www.nuget.org/packages/Microsoft.Hadoop.Avro/
March 31, 2015 17:04
One of the most useful blog posts I've ever read. Most of this I have had to learn the hard way (especially the harness overarching the tests).

I will be using this as a rule of thumb and thank you for the links to so many useful tools!

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.