To be honest - most of the collections in Spring 1.2 are dead slow - especially on Win32. But to be fair, almost nobody complained about that so far. Now that we have some shiny tool to print some numbers here are some simple benchmarks - they show a similar pretty sad picture for almost every method as most of them are virtual in 1.2.
The benchmark simply measures how long it takes to add n integers to a list with a pre-allocated capacity - so this is just the duration it takes to call and stuff that integer into the internal array (and whatever else is going on) - compiled with 10.4.2 - Win32 release. You can see how hard that return stack buffer trashing hits the performance here.
The numbers for Win64 are a little less terrible but still far from good:
Now fast forward to the latest development state - Win32:
and Win64:
Before you ask - well - simply adding Spring4D to the binary makes the RTL faster or slower! No, seriously, that was a joke. You can see the effect that Professor Berger mentioned in his talk that I linked in my last blog post. So be careful with these numbers. Simply flipping some bit in your binary can affect completely unrelated code.
However, the important point here and one that is consistent regardless of the relative numbers is the improvement from 1.2 to 2.0 and this goes for all collections across the board regarding 3 key factors:
1. speed - there are no virtual methods anymore behind the interfaces, the lists are optimized to perform the fastest way possible especially when there are no OnChanged event handlers attached to them which is the common case and where you want the pure speed.
2. binary size - as you know generics can cause quite a huge chunk of binary code because for every specialization the code is basically duplicated even for binary identical types including their RTTI. In 2.0 all RTTI for the implementing classes which you will never touch anyway is turned off and with some trickery, many generic specializations are folded and forced into the Spring.Collections.dcu to not pollute each and every dcu you compile a list or dictionary into. And it does not only shrink your binary, but also speeds up your compilation as the compiler simply has less work to do - no generating code (RTTI and executable code) into dcu which the linker later has to go through and eliminate duplicates.
3. instance size - while it certainly does not matter if a list instance itself is 58byte or 80byte when you store thousands of integers or objects in them it can make a difference if you have many of those lists for example in every data object and each of them only holds a handful of items. Due to field reordering and implementing the interfaces on one class instead of on multiple ones this saves a few bytes for every instance as those interfaces that inherit from each other such as IList<T> inherits from IEnumerable<T> those share the same IMT slot in every object instead of occupying each their own. On 64bit this saving is even bigger due to the doubled size for pointers as you can imagine.
2.0 will provide the perfect balance between the first two points - providing the best speed while keeping the binary bloat low and providing a rich and powerful API as you already know and love it including some valuable additions.
While the development of 2.0 is still going on it is close to completion - currently, I am working on finalizing the refactoring of our dictionary which as you might know was the last collection type that was simply a wrapper around the RTL in 1.2. But more about that in the next article.
For now just a benchmark of different dictionary lookups - the match rate is 100% in these benchmarks which means all lookups are successful - yup, that's roughly a 4-7 times improvement over the RTL!