Tuesday, June 15, 2021

Spring4D 2.0 sneak peek - the evolution of performance

To be honest - most of the collections in Spring 1.2 are dead slow - especially on Win32. But to be fair, almost nobody complained about that so far. Now that we have some shiny tool to print some numbers here are some simple benchmarks - they show a similar pretty sad picture for almost every method as most of them are virtual in 1.2.

The benchmark simply measures how long it takes to add n integers to a list with a pre-allocated capacity - so this is just the duration it takes to call and stuff that integer into the internal array (and whatever else is going on) - compiled with 10.4.2 - Win32 release. You can see how hard that return stack buffer trashing hits the performance here.


The numbers for Win64 are a little less terrible but still far from good:



Now fast forward to the latest development state - Win32:


and Win64:



Before you ask - well - simply adding Spring4D to the binary makes the RTL faster or slower! No, seriously, that was a joke. You can see the effect that Professor Berger mentioned in his talk that I linked in my last blog post. So be careful with these numbers. Simply flipping some bit in your binary can affect completely unrelated code.

However, the important point here and one that is consistent regardless of the relative numbers is the improvement from 1.2 to 2.0 and this goes for all collections across the board regarding 3 key factors:

1. speed - there are no virtual methods anymore behind the interfaces, the lists are optimized to perform the fastest way possible especially when there are no OnChanged event handlers attached to them which is the common case and where you want the pure speed.

2. binary size - as you know generics can cause quite a huge chunk of binary code because for every specialization the code is basically duplicated even for binary identical types including their RTTI. In 2.0 all RTTI for the implementing classes which you will never touch anyway is turned off and with some trickery, many generic specializations are folded and forced into the Spring.Collections.dcu to not pollute each and every dcu you compile a list or dictionary into. And it does not only shrink your binary, but also speeds up your compilation as the compiler simply has less work to do - no generating code (RTTI and executable code) into dcu which the linker later has to go through and eliminate duplicates.

3. instance size - while it certainly does not matter if a list instance itself is 58byte or 80byte when you store thousands of integers or objects in them it can make a difference if you have many of those lists for example in every data object and each of them only holds a handful of items. Due to field reordering and implementing the interfaces on one class instead of on multiple ones this saves a few bytes for every instance as those interfaces that inherit from each other such as IList<T> inherits from IEnumerable<T> those share the same IMT slot in every object instead of occupying each their own. On 64bit this saving is even bigger due to the doubled size for pointers as you can imagine.


2.0 will provide the perfect balance between the first two points - providing the best speed while keeping the binary bloat low and providing a rich and powerful API as you already know and love it including some valuable additions.


While the development of 2.0 is still going on it is close to completion - currently, I am working on finalizing the refactoring of our dictionary which as you might know was the last collection type that was simply a wrapper around the RTL in 1.2. But more about that in the next article.

For now just a benchmark of different dictionary lookups - the match rate is 100% in these benchmarks which means all lookups are successful - yup, that's roughly a 4-7 times improvement over the RTL!





10 comments:

  1. Well done Stefen! And, just as a clue that indicates that you are heading to the right direction (optimizing the collections), re. "almost nobody complained about that so far. ", I did remove Spring4D from a project due to the human-noticeable program startup delay ;)

    ReplyDelete
    Replies
    1. Since you indicate that Spring was responsible for that - can you specify what exactly caused the delay? I guess it was executing code which did not run as fast as expected.

      Delete
    2. Hi Stefan, sorry for not being able to provide the details, it was years ago and I cannot recall the details... Should I delete the comment? Spring4D is awesome and my previous comment without details doesn't contribute obvious value...

      Delete
    3. I was just curious because maybe the reason was something that I am not aware of.

      Also better to let the author/maintainer know there is some issue than silently not using it. Maybe they were not aware of the issue and you might miss the chance to cause an improvement everyone will benefit from.

      Delete
  2. Hi Stefan. Thank you very much for Spring4D. It's an amazing and valuable library.

    ReplyDelete
  3. Hi,

    Thanks for your work.

    We are very interested in point 2. We use a lot of generics and compilation/linking durations have increased ... Can you explain this statement : "In 2.0 all RTTI for the implementing classes which you will never touch anyway is turned off and with some trickery, many generic specializations are folded and forced into the Spring." What are the "trickery" ?
    We really appreciate some help.
    Best regard
    Laurent

    ReplyDelete
    Replies
    1. Study the code in Spring.Collections.pas - especially the factory methods in TCollections.
      This technique is possible because of interfaces. To the consumer you have strongly typed IList<TCat> and IList<TDog> but behind both is basically just a TList<TObject>.

      Delete
    2. Generally when using generics and want to reduce bloat you need to keep the actual generic code small. Depending on the allowed types for your generic type arguments this can be easier or more complicated. Especially when constraining on class then it is pretty easy as you can directly use code that works with TObject and TClass internally.
      When you have code like in collections where T can be anything its more complex. When your class hierarchy gets large often generics creep deeper into it than often necessary.

      Delete
  4. Thanks Stefan.
    I understand your technique with interfaces. Unfortunately, our generic code is in this case based on record : Type Ensemble = RECORD
    So duplicated in dcu ...
    But it's a good idea.

    ReplyDelete
  5. Good information.







    -----------------------------------
    I work in machine learning engineer

    ReplyDelete