In my previous post, I tested a few different semantic version data structures to see how the data structure impacts sorting performance. One of my implementations was very similar to NuGet's NuGetVersion
class, in the NuGet.Versioning
package. My "clean room" implementation was significantly faster than NuGet's. In this post I dig deeper to find out why.
Anyone following along with my Benchmarks github repo, I'm referring to VersionWithStringArray
. In case I modify the repo in the future, commit hash c5fb1cc9f54c84a7752eba71171e43a9e3ca7f83 is what this blog post is based on.
Methodology
Perhaps it would have been ideal for me to start with using performance tracing tools like PerfView. However, since my implementation and NuGet's implementation have completely different call stacks, with different number of methods, I didn't think that a perf trace would give insights into the differences between these implementations.
What I did instead, was I created a file, NuGetCopy.cs
. This file defined a namespace, not a class, NuGetCopy
. In this namespace, I copied the classes NuGetVersion
and VersionComparer
from the NuGet.Client repo (keeping the original license header). Then, I kept adding using
statements, and copying more classes into this single file, until the project would compile. Now I have a complete copy of NuGet's NuGetVersion
and VersionComparer
classes, and their dependencies, in a single file in my own project!
From there, I could very quickly copy NuGetCopy.cs
to a new file name, NuGetCopy2.cs
, change the namespace, to NuGetCopy2
, and again my project compiles. In the new copy, I can make minimal changes, even a single c# statement, and run benchmarks comparing the different copies. An unintended side-effect is that you can diff two files to very easily see all the changes made between them. For example, here's the complete diff between NuGetCopy2.cs
and NuGetCopy3.cs
, which indeed is a single line change.
15a16,18
> NuGetCopy3 has the following modifications:
> * Fixed bug in GetReleaseLabelsOrNull where `labels != null` should be `labels == null`
>
22c25
< namespace SemanticVersionBenchmarks.Implementations.NuGetCopy2
---
> namespace SemanticVersionBenchmarks.Implementations.NuGetCopy3
368c371
< if (labels != null && enumerable != null)
---
> if (labels == null && enumerable != null)
Finally, when running the benchmarks, I saw a significant variability between runs, the mean duration would vary by more than 10% for the same implementation in different runs. I found the variation was minimised if in the benchmark class I set the processor affinity to a single CPU core. Without this, watching Task Manager's performance tab, with the CPU graph set to "Logical Processors", not "Overall utilization", I could see that each benchmark would run on a single core, but different benchmarks in the same dotnet run
execution would sometimes run on different cores. This benchmark, sorting a list of objects, is inherintly single threaded, and since this is the only code the benchmark executes, I believe it is representative of "production" scenarios.
Results
Rather than keep you in suspense about the results, let's have a look at it first, and then I'll explain the difference between all the copies. I'm also only showing the .NET 5 results, as comparisons to other runtimes are no more interesting than my previous blog post.
Implementation | Mean | Ratio from previous | Ratio from NuGetVersion |
---|---|---|---|
NuGet | 421.9 | - | - |
NuGetCopy | 411.9 | 0.97x | 0.97x |
NuGetCopy2 | 421.4 | 0.99x | 0.96x |
NuGetCopy3 | 412.9 | 1.53x | 1.48x |
NuGetCopy4 | 412.6 | 1.11x | 1.64x |
NuGetCopy5 | 289.2 | 0.98x | 1.60x |
NuGetCopy6 | 282.2 | 1.04x | 1.68x |
NuGetCopy7 | 298.7 | 1.04x | 1.75x |
VersionWithStringArray | 233.8 | 1.15x | 2.01x |
I find performance numbers a bit difficult to talk about. Consider, for example, NuGetCopy3
's mean time m3
, to NuGetCopy2
's mean time m2
. m2/m3
is 1.53, which is what I reported in the table. Therefore, it seems to be a 53% performance gain, which is the same as (m2-m3)/m3
. But (m2-m3)/m2
is 0.35, so does that mean it's actually a 35% performance increase, not 53%? Or is one of the two percentages how much faster one is than the other, and the other percentage is how much slower it is? I've been working on the code and this blog post for too long, and it's too late at the time I'm writing this, to think about this right now. Hence I'm just reporting the m3/m2
ratio.
So, what changed?
Let me explain each of these copies.
NuGetCopy
As previously mentioned, this is a copy of NuGet's code, just enough to compile with the two classes used. Ideally it should have identical performance to the NuGet package itself.
NuGetCopy2
This copy removed all comments, and all methods that are not used/needed by the benchmark itself. So again, in theory we should expect this to have the same performance as NuGetCopy
and NuGet's package.
NuGetCopy3
Finally, the first actual change. The change is a single character bug fix, which needs some explanation.
NuGet has a pattern of using IEnumerable
thoughout the codebase, rather than returning the actual type methods and classes use internally, or a closer abstraction such as IList
. A good reason why IReadOnlyList
wasn't used is because that was added to the .NET Framework 4.5, but NuGet 1.0 was written when the .NET Framework 4.0 was the current/newest version. Hence IReadOnlyList
did not yet exist.
In the Compare
method, if the "stable" part of the version is the same, so the comparer needs to check the prerelease labels, it first tries to cast the IEnumerable<string> ReleaseLabels
to string[]
, which is how the version class stores the value. If the typecast fails, it uses LINQ's .ToArray()
to get a string[]
.
Well, it was supposed to be if the typecast fails. The logic bug was that it would only do this if the typecast succeeds, leading to pointless LINQ, which causes memory allocations and therefore garbage collection.
NuGet.Versioning has an odd extensibility model built into the version class, and if someone extended SemanticVersion
so that ReleaseLabels
could not be typecast to string[]
, then the compare would throw a null reference exception.
Looking at the benchmark results, we can see that this bug causes 20,000 (twenty thousand) Gen 0 garbage collections on .NET Core 3.1 and .NET 5, and 111,000 (one hundred and eleven thousand) Gen 0 garbage collections, allocating 87 megabytes of memory. No wonder fixing this bug improved performance by about 50%. I think it also explains the performance difference between .NET Framework and .NET Core (including .NET 5). Evidently the garbage collection algorithm was changed to run less frequently (note the number of bytes allocated was measured to be the same on all runtimes). Hence more garbage collections on .NET Framework results in slower performance.
NuGetCopy4
NuGet implemented the version classes by using a System.Version
field, and then each of the properties would dereference this reference object. I changed this to use auto-properties, setting the values in the constructor, and get rid of the System.Version
backing field. For example, I changed Major => _version.Major;
to Major { get; }
.
The 11% performance gain from this is significant, in my opinion.
NuGetCopy5
When the compare needs to compare prerelease labels, NuGet's current implementation will use Math.Max(x.ReleaseLabels.Length, y.ReleaseLabels.Length)
to determine the number of segments to check, and then inside the loop it compares the iteration number i
to x.ReleaseLabels.Length
and y.ReleaseLabels.Length
to determine when they're not the same length.
I changed this to use Math.Min
instead, and handled the case when the arrays are different lengths after the loop.
I don't understand how this could possibly be slower, yet the benchmark results show it about 2% slower. This happened each time I ran the benchmark suite, so I can't put it down to bad luck during the run. Yet I'm so convinced that it should have a performance benefit that I refused to remove it from my benchmarks or this blog post.
NuGetCopy6
NuGet's implementation of the IsPrerelease
property has it iterate the ReleaseLabels
property, and when it finds an item that is not string.IsNullOrEmpty
, returns true. Version parsing will fail if any of the items are null or empty, although NuGet did make SemanticVersion
and therefore VersionComparer
extensible, so theoretically someone could extend the class and implement their own version in a way that allows this. Anyway, since I was testing NuGetVersion
, the release labels first item would always trigger the return true
condition when the release labels list was not null.
The point I'm trying to make is that every single time a NuGetVersion
is compared to another and the Major, Minor, Patch and Revision properties were equal, rather than returning a pre-computed IsPrerelease
, it would run this LINQ expression.
I changed this to validate in the constructor that ReleaseLabels
is either null (a "stable" version), or that none of the items are null or empty. I changed IsPrerelease
to an auto-property, so it no longer contains any logic, just a field retrieval, and moved ReleaseLabels
's logic of return _releaseLabels ?? EmptyReleaseLabels
to the constructor, so ReleaseLabels
is also zero logic, just a field retrieval.
Note that technically this is a breaking change, since the current code handles extensibility scenarios that my change does not.
NuGetCopy7
This is a major breaking change. I merged the SemanticVersion
class, merging everything into NuGetVersion
, marking it also as sealed
, removing the virtual
keyword from the two properties it was on, and removing any typecasting in the compare method.
Summary
Anyone following ASP.NET Core's performance from the pre-1.0 days will not be surprised, as the ASP.NET Core and .NET runtime teams made enormous performance gains in the TechEmpower benchmarks. These efforts were discussed on the ASP.NET Community Standup videos, and a very significant percentage of the improvement (most of the improvement?) came from reducing memory allocations, rather than improving algorithm complexity. Therefore, this benchmark suite's near 1.5 performance ratio is completely feasible.
I found it particularly interesting that the number of bytes allocated on all the runtimes were the same, but .NET Framework ran 5.5 times as many garbage collections. Therefore, if your app does a lot of allocations, you may benefit greatly from migrating to .NET Core.
I also found NuGetCopy4
's approximate 10% gain interesting. The difference was eliminating a pointer dereference. Granted, this dereference would be happening in a very tight loop. but it shows that if your code is running a tight loop, that pointer dereferencing can have a non-trivial impact. In this case the pointer dereference was an object stored in a class field, but it could be a virtual
method as well. But don't take this advise too strictly. I'd expect the tight loop needs to be extremely tight, with just a small number of instructions between dereferences. If you have any non-trivial code between pointer dereferences, that probably overwhelms the time to dereference, so you won't see much of an improvement.
With these incremental changes, I'd call them all small, but some were certainly smaller than others, I got a copy of NuGetVersion
to within 15% of the performance of my "clean-room" implementation of VersionWithStringArray
, from my previous blog post. Determining how to close that gap is time consuming, which I'm not going to do because I want to get this post out and move onto other things. But it shows how a single character bug can harm performance so much, as does memory allocations.
All the copies up to NuGetCopy5
are fully compatible with NuGet's current code, so I will create a pull request later to incorporate those changes into the prouct. For better or worse, sorting NuGetVersion
instances is never an expensive operation in the product, so this won't result in any meaningful performance improvement overall. But the benchmarks certainly look nice.
Final words
You might wonder why I didn't spend more time trying to understand why different runs of the test suite resulted in NuGet
, NuGetCopy
and NuGetCopy2
having somewhat different performance ratios each time. I felt like I already spent too much time running them over and over again. The performance difference was usually 2-3%, and since VersionWithStringArray
was twice as fast, I didn't want to get too distracted with a small detail.
Next, you might ask why I didn't spend more time understanding why NuGetCopy5
's change reduced performance when I'm so confident that it should improve performance. The reason for this is because I've spent more hours working on this blog post than I originally intended. In fact, if I don't post this blog post now, in the last few hours of February, then I won't get any blog post out in February. Three quarters of the planet is already on the 1st of March, but at least I'm getting this post out in February in the timezone that I'm currently living in.
Combining the two issues/questions, I originally had a completely different set of changes, going up to NuGetCopy9
. It had the same performance as VersionWithStringArray
. I wrote up this blog post, I was so close to publishing it, and then I wanted to check one last thing. I debugged a unit test where I stepped through where the ReleaseLabels as string[]
is done, and noticed that it was still running the .ToArray()
. Quickly testing this fix showed all my other work was on a shaky foundation. The copy that had the biggest performance gain wasn't because of all the changes I made, it was because I deleted the .ToArray()
line since I made that scenario impossible. But breaking API and ABI compatibility in the process. Once I discovered the != null
bug, and confirming that the one character change resulted in a huge performance gain, I had to start all my work again from the beginning. Well, not completely from nothing, I still had the knowledge of all the other things I had attempted and what worked well and what did not. And I used it as an opportunity to put all the API/ABI compatible changes at the beginning, and breaking changes later. This made it much easier for me to determine which changes I'm going to use in a pull request on the NuGet.Client repo.
But it sure was frustrating to have learned that I spent hours on a bad path.