Semantic Version Sorting Performance in .NET

A few weeks ago, my colleague reported that something he was working on was super slow when there were many versions. Although the issue doesn't say anything about sorting, it got me thinking that NuGet's NuGetVersion sorting implementation could probably be improved. In this blog post I do an initial investigation.

Background

NuGet's code for parsing and comparing versions is in an assembly named NuGet.Versioning, and is available as a package on nuget.org. NuGet has two different version classes. The first is SemanticVersion, which I used to think was an accurate implementation of the Semantic Versioning v2 specification (also known as SemVer2), which is a commonly used standard in software development. However, while researching for this blog post, I discovered that there are several places that NuGet's SemanticVersion implementation deviates from the SemVer2 spec. It's not important for the context of this blog post, so I'm going to leave it at that. For anyone not familiar with SemVer2, the general format is <major>.<minor>.<patch>[-<pre-release>][+<metadata>]. Major, minor and patch must only be digits, pre-release is optional and are period ('.') separated alpha-numeric segments, and metadata is optional, and ignored for comparisons (meaning two versions with the same major, minor, patch and pre-release values, but different metadata, are still considered equal).

Quick background information, System.Version, built into .NET's Base Class Library (BCL), uses up to 4 components, <major>.<minor>[.<build>[.<revision>]], but only the first two components are mandatory. These segments are all digits, no letters allowed anywhere. This is the version format that most Microsoft products have used for decades.

NuGetVersion is basically the union of these two versioning schemes. The end format is roughly <major>[.<minor>[.<patch>[.<revision>]]][-<pre-release>][+<metadata>]. I have no idea why the minor segment is optional, as neither System.Version nor SemVer2 allows it to be optional, but I wanted to focus on performance, not correctness.

Programs = Data Structures + Algorithms

Data structures and algorithms are closely linked, as to change one often requires a change to the other. I didn't look at the implementation of NuGet's SemanticVersion or NuGetVersion classes before starting my benchmark adventures, but I had seen it as part of my work over 2+ years I've been in the team, and roughly remembered that it stored pre-release segments in a string array, and would try to parse the string as an int every comparison. This is where I thought the performance gains could be gained.

Pre-Release As A String Array

Since I didn't want to be overly influenced, or biased, by NuGet's implementation, so I started with a clean implementation, but still of what I remembered of NuGet's implementation. The important parts for the purpose of this blog are:

public class VersionWithStringArray
{
        public string OriginalString { get; }
        public uint Major { get; }
        public uint Minor { get; }
        public uint Patch { get; }
        public uint? Revision { get; }
        public string[]? Prerelease { get; }
        public string? Metadata { get; }
}

The code I wrote to compare two of these versions is about 80 lines, which is more than what I want to put in a blog post. The important part for this blog post are when comparing two segments of the Prerelease arrays:

var xb = uint.TryParse(x[i], out uint xi);
var yb = uint.TryParse(y[i], out uint yi);

if (xb || yb)
{
    if (!xb)
    {
        return 1;
    }

    if (!yb)
    {
        return -1;
    }

    var cmp = xi.CompareTo(yi);
    if (cmp != 0)
    {
        return cmp;
    }
}
else
{
    var cmp = StringComparer.OrdinalIgnoreCase.Compare(x[i], y[i]);
    if (cmp != 0)
    {
        return cmp;
    }
}

In short:

Try to parse each string as an int
If only one of the two can be parsed as an int, it has a lower version as per SemVer2's 11.4.3 rule.
If both can be parsed as ints, compare them as per SemVer2's 11.4.1 rule.
If both strings are alphanumeric, compare them lexically, as per SemVer2's 11.4.2 rule.

In a large collection of versions, each item in the collection may be compared many times. Which means that uint.TryParse might be called many times for the same string. This is where I thought the performance could be gained.

Pre-release As An Array Of Classes

My first attempt was to change public string[]? Prerelease { get; } to public PrereleaseSegment[]? Prerelease { get; }, where PrereleaseSegment is:

public class PrereleaseSegment
{
    public string Value { get; }
    public uint? Int { get; }
}

When parsing the version string, at the time that the pre-release segments are split, try to parse as a uint, and if it succeeds, then save the result int he Int property. This changes the compare method only a little:

var xs = x[i];
var ys = y[i];

if (xs.Int.HasValue || ys.Int.HasValue)
{
    if (!xs.Int.HasValue)
    {
        return 1;
    }

    if (!ys.Int.HasValue)
    {
        return -1;
    }

    var cmp = xs.Int.Value.CompareTo(ys.Int.Value);
    if (cmp != 0)
    {
        return cmp;
    }
}
else
{
    var cmp = StringComparer.OrdinalIgnoreCase.Compare(xs.Value, ys.Value);
    if (cmp != 0)
    {
        return cmp;
    }
}

The only significant difference is that instead of calling uint.TryParse to parse two strings, instead we're getting the PrereleaseSegment instance from the array. Then, we use property getters to check the nullable ints, or string values if needed.

Pre-release As An Array Of Structs

I was curious, if PrereleaseSegment was a struct, instead of a class, would it make a performance difference?

In case you're not familiar with .NET/C#, the difference between a struct and a class is that a class is allocated on the heap whereas a struct is allocated on the stack and needs to be "boxed" when on the heap. This still might not be meaningful to some developers, so I'll say that the heap is what most people think about when talking about "allocating memory", but this topic is outside the scope of this blog post, so I'll suggest you do a web search to find out more.

The reason that it's interesting for a performance analysis is if you have a deep understanding about how computers work at a hardware level. While the von Neumann architecture causes RAM to look like one large block, CPUs actually have a much smaller, but much faster, cache of parts of that RAM. If you can fit all your algorithm's data structures in the CPU cache, it will be much faster than if the CPU cache has a cache miss, and data needs to be retrieved from RAM. But it does so in blocks. Lets say your CPU has a cache block size of 2KB. If you need to read even a single byte of memory, it will read an entire 2KB block from RAM, throwing away the 2KB block that was there beforehand. Therefore, if you can squeeze your data structures into contiguous part of memory (especially if you can start it at a cache block size boundary, which is why you sometimes see mentions of alignment), then you can squeeze more performance out of your code.

In .NET, an array of classes are effectively an array of pointers, for those of you with experience with C/C++, or similarly manually managed memory languages. Every item in the array could be anywhere in memory. Whereas in .NET an array of structs are a contiguous block of memory of memory, much like in C/C++. Note that any reference type, such as a string property/field in the struct is still a pointer, which could be anywhere in memory. But array[0] and array[1] are garunteed to be in sequential memory addresses if the array's data type is a struct. Therefore, it's theoretically possible for a version object with an array of PrereleaseSegment classes to have each of those segment instances in far-apart memory addresses, each of which would need a different CPU cache block. Whereas we minimise that risk by using a struct. Even if the array of structs is not aligned to the CPU cache block size, as long as the array has at least 3 items, we're guaranteed that at least one of the CPU cache blocks will contain more than one instance of our struct.

This doesn't mean that using structs is universally better for performance than classes. When passing a struct as a method parameter, the entire instance is copied, so it uses more stack space. If the method modifies it, the original value from the caller is not modified, which is sometimes desirable, sometimes not. Later versions of C#/.NET added ref struct which avoids both of these issues. A common gotcha, which I often hit myself, is using a struct as a value in a dictionary. For example, given a Dictionary<string, (string parent, int count)>, doing dict[str].count++ will not do what you think it will do. Instead, you need to assign a new "instance" of the struct, with updated values: dict[str] = (dict[str].parent, dict[str].count++);.

All that is a lot of text to explain what was just a single line change, changing public class PrereleaseSegment to public struct PrereleaseSegment. Everything else is identical to VersionWithClassArray, it's just the IL generated by the compiler which will be different.

Pre-release As Two Arrays

Later, I wondered if I could more easily keep API compatibility with the string array implementation by using a second array of uint?[]. The important compare code for this implementation is

var xs = xIntSegments[i];
var ys = yIntSegments[i];

if (xs.HasValue || ys.HasValue)
{
    if (!xs.HasValue)
    {
        return 1;
    }

    if (!ys.HasValue)
    {
        return -1;
    }

    var cmp = xs.Value.CompareTo(ys.Value);
    if (cmp != 0)
    {
        return cmp;
    }
}
else
{
    var cmp = StringComparer.OrdinalIgnoreCase.Compare(xStringSegments[i], yStringSegments[i]);
    if (cmp != 0)
    {
        return cmp;
    }
}

Benchmarks

I used used the full list of every version string on nuget.org as my sample data, and also added a benchmark using NuGetVersion to compare against my implementations.

I had tested with both randomly sorting the arrays before each iteration, as well as using the same random seed, so that each iteration is sorting with the same initial sort order. This means that each iteration should contain the same number of swaps. The non-deterministic "randomization" of the input array did cause the benchmarks to have slightly slower runtimes most executions, and BenchmarkDotNet reported higher standard deviation. But the values were close enough, and I didn't want a lucky or unlucky order to influence the benchmark, so I used the deterministic ordering instead.

I also tested on the .NET Framework (compiled for 4.7.2, but running on 4.8), .NET Core 3.1 and .NET 5, out of curiosity. None of the benchmark code is different when targeting the different target frameworks, so I expect the compiler to generate the same IL in all cases.

The results on my computer are:

Implementation	.NET Framework	.NET Core 3.1	.NET 5
NuGetVersion	758.2 ms	494.4 ms	517.2 ms
VersionWithString	402.9 ms	234.7 ms	229.1 ms
VersionWithClass	239.9 ms	210.0 ms	203.6 ms
VersionWithStruct	241.1 ms	208.9 ms	202.7 ms
VersionWithTwoArrays	244.9 ms	211.3 ms	222.8 ms

Analysis

There are a few things we can conclude from these results. But first I want to mention that I was getting quite a lot of variability in the results from run to run. When BenchmarkDotNet ran a job, all the iterations happened on the same CPU core, as monitored using Windows' Task Manager's Performance tab. My CPU has 16 logical processors, and since these benchmarks are single-threaded, I didn't close all other apps while running the benchmarks. It appeared that the fastest of the benchmark results always occurred when the job was running on core 10, but would be around 10% slower when running on core 3. Interestingly Windows seemed to always schedule the benchmark jobs on one of these two cores. I had some browser tabs open while the benchmarks were running, watching videos in one. Task manager showed all the other cores as idle, but I might have got more reliable results if I had closed down everything, except the benchmark app. Unfortunately I'm not motivated to close everything and stop using my computer for a few minutes, to re-run the benchmarks. These results will have to be good enough for now.

First, what I think is most obvious is the fact that NuGet's NuGetVersion is twice as slow as my VersionWithString implementation. I started digging into this, but it turned out to be complex, and interesting, but I've already delayed posting this for 2 weeks, so I'll leave that investigation for another blog post.

Second, we can see that .NET Core and .NET 5 have significant performance gains over the .NET Framework. As mentioned before the results table, there is no difference in my code between these runtimes, so the improvements are in the runtime itself. There's a 15% performance gain in my VersionWithClass implementation, a 43% performance gain in my VersionWithString implementation, and even NuGet's NuGetVersion implementation has a 32% performance gain. If your app is running on the .NET Framework and there's anything at all unsatisfactory about its performance, I can't recommend highly enough evaluating if you can run on .NET 5, or at least .NET Core 3.1 if running on a LTS runtime is really important.

Next, VersionWithTwoArrays appears to have about the same performance on .NET Core 3.1 as VersionWithClass and VersionWithStruct, but I feel like it was the luck of the CPU core which executed the tests. From memory, VersionWithTwoArrays was generally about 10% slower, as the .NET 5 results show. My hypothesis is that in .NET array indexing does bounds checking to minimise the risk of the #1 cause of security bugs in C/C++ apps: array overruns. The compiler/runtime can do some amount of static analysis and optimize away the bounds check in certain circumstances when it knows the indexer you're using can never be outside the bounds of the array, but I doubt that it's smart enough to know that the two arrays in VersionWithTwoArray are always the same length. Hence I believe that this implementation does more bounds checking compared to VersionWithClass and VersionWithStruct. Don't get me wrong, the performance impact of the bounds check is very, very small. But in the compare method implementations, the amount of code executed after getting the array item is almost nothing, meaning that any overhead may be noticeable. I'm not sure if an unsafe block would allow the bounds checking to be skipped, but I think that safety is more important, and the single array implementations provide a better alternative.

Finally, can we make any conclusions from the VersionWithClass and VersionWithStruct results? The numbers are about the same in all 3 runtimes. However, I propose that no, the results are meaningless because there's a fundamental flaw in the testing methodology. These benchmarks start with an empty memory space, and do nothing more than parse the version strings into an array, then sort (and randomize) the array a bunch of times, and then exit. This means that the VersionWithClass implementation will most likely end up with all the PrereleaseSegment instances in contiguous memory anyway, and there will be no memory fragmentation or memory compacting which might cause the different pre-release segments for a single version instance to be in significantly different memory addresses, causing additional cache-misses that I discussed earlier. Therefore, I expect that VersionWithClass should have very similar performance characteristics of VersionWithStruct in a benchmark, but could possibly have very different performance characteristics of a real, long-lived app, such as Visual Studio. This highlights the importance of measuring real product code, not just micro-benchmarks.

Where to from here?

I've been working on these benchmarks as a side-project for this blog, not as part of my day job. So, there's going to be slow progress. I'd like to continue to investigate why NuGetVersion has so much worse performance than my VersionWithStringArray implementation. I started, making very small changes at each iteration, and after the 3rd iteration I found something that had a big performance impact, but when I undid the first two iterations and kept only the 3rd, the performance was actually significantly worse than the original NuGetVersion code. Therefore, I want more time to investigate further, but also I was worried about how long this blog post was going to end up. In any case, if I can find some changes that can be made to NuGet.Versioning to improve performance, while maintaining full compatibility with the existing API, I will create a pull request.

However, due to the design/API of NuGetVersion and SemanticVersion, many of the differences between them and my clean-room implementations of a version class, which have well over double the sorting performance, cannot be applied without breaking API changes. I'll mention it to my team, and see if there's any interest to break the API in the future for these performance gains. However, taking a performance trace of NuGet running "in the real world" does not show that sorting performance is at all a problem. Therefore breaking the API for a meaningless performance improvement does not feel like a good trade-off.

Additionally, I wanted to play around with benchmarking different parsing implementations, as I have some ideas for some radically different implementations that I want to compare. I wanted to find out what the best data structure for sorting was, to avoid needing to change the implementation of the parsing code, although in the end all of the sorting data structures were so similar it wouldn't make much of a difference in parsing anyway.