Tuesday, July 17, 2018

Loop wars - measuring performance

Today on Google+ there was some off topic discussion going on and the claim was being made that for-in loops are slower than for-to loops and some benchmark was being posted to back that argument.

Because I like to measure and investigate things I looked into that. I was running the following benchmark. Yes, it is a micro benchmark - let's not argue about the usefulness of this in isolation.

Keep in mind that testing code in the dpr main influences results because all variables are then global instead of local variables - that is why I put such code into some routine.

procedure Main;
  xStrings: TStrings;
  i: Integer;
  s: string;
  sw: TStopwatch;
  xStrings := TStringList.Create;
  for i := 1 to 10000000 do

  sw := TStopwatch.StartNew;
  for i := 0 to xStrings.Count - 1 do
    s := xStrings[i];

  sw := TStopwatch.StartNew;
  for s in xStrings do;


Running this on my machine here gave me following average results across multiple runs (release config):


During the for-in loop I am not assigning s to some other variable which was done in the originally posted code because I already have the value for each iteration, doing that again would result in one more string assignment in the for-in loop - be fair when comparing both.

Looks like one loop is more than twice as fast as the other - let's investigate why that is.

The TStringsEnumerator being used here is a class which comes with some overhead for its object allocation but only once for the GetEnumerator call. It might be of significance when measuring many loop iterations but in this case we are not. Anyway, we are going change it, copy the code from Classes.pas and change it to a record and use a helper for TStrings to easily get our enumerator being used. This is the added code:

  TStringsEnumerator = record
    FIndex: Integer;
    FStrings: TStrings;
    function GetCurrent: string; inline;
    function MoveNext: Boolean;
    property Current: string read GetCurrent;

  TStringsHelper = class helper for TStrings
    function GetEnumerator: TStringsEnumerator;

{ TStringsHelper }

function TStringsHelper.GetEnumerator: TStringsEnumerator;
  Result.FIndex := -1;
  Result.FStrings := Self;

{ TStringsEnumerator }

function TStringsEnumerator.GetCurrent: string;
  Result := FStrings[FIndex];

function TStringsEnumerator.MoveNext: Boolean;
  Result := FIndex < FStrings.Count - 1;
  if Result then

Looking at the results again - nothing significant changed but that was expected.

The generated assembly of the for-to loop:

Loops.dpr.60: for i := 0 to xStrings.Count - 1 do
004D5EB8 8B45F0           mov eax,[ebp-$10]
004D5EBB 8B10             mov edx,[eax]
004D5EBD FF5214           call dword ptr [edx+$14]
004D5EC0 8BD8             mov ebx,eax
004D5EC2 4B               dec ebx
004D5EC3 85DB             test ebx,ebx
004D5EC5 7C1C             jl $004d5ee3
004D5EC7 43               inc ebx
004D5EC8 C745EC00000000   mov [ebp-$14],$00000000
Loops.dpr.61: s := xStrings[i];
004D5ECF 8D4DFC           lea ecx,[ebp-$04]
004D5ED2 8B55EC           mov edx,[ebp-$14]
004D5ED5 8B45F0           mov eax,[ebp-$10]
004D5ED8 8B30             mov esi,[eax]
004D5EDA FF560C           call dword ptr [esi+$0c]
004D5EDD FF45EC           inc dword ptr [ebp-$14]
Loops.dpr.60: for i := 0 to xStrings.Count - 1 do
004D5EE0 4B               dec ebx
004D5EE1 75EC             jnz $004d5ecf

You can see that there are in fact 2 loop variables being used, one counting down to zero from xStrings.Count (stored in ebx), one counting up being used for accessing the items (stored on the stack at [ebp-$14]). The Count property is only read once. Anyhow changing the code in the enumerator to only read Count once and store that internally will not change things drastically.

The assembly of the for-in loop - oh wow:

Loops.dpr.67: for s in xStrings do;
004D5F11 8D55C4           lea edx,[ebp-$3c]
004D5F14 8B45F0           mov eax,[ebp-$10]
004D5F17 E8E8FEFFFF       call TStringsHelper.GetEnumerator
004D5F1C EB44             jmp $004d5f62
004D5F1E 33C0             xor eax,eax
004D5F20 55               push ebp
004D5F21 685B5F4D00       push $004d5f5b
004D5F26 64FF30           push dword ptr fs:[eax]
004D5F29 648920           mov fs:[eax],esp
004D5F2C 8D4DF4           lea ecx,[ebp-$0c]
004D5F2F 8B55C4           mov edx,[ebp-$3c]
004D5F32 8B45CC           mov eax,[ebp-$34]
004D5F35 8B18             mov ebx,[eax]
004D5F37 FF530C           call dword ptr [ebx+$0c]
004D5F3A 8D45FC           lea eax,[ebp-$04]
004D5F3D 8B55F4           mov edx,[ebp-$0c]
004D5F40 E89F4EF3FF       call @UStrLAsg
004D5F45 33C0             xor eax,eax
004D5F47 5A               pop edx
004D5F48 59               pop ecx
004D5F49 59               pop ecx
004D5F4A 648910           mov fs:[eax],edx
004D5F4D 68625F4D00       push $004d5f62
004D5F52 8D45F4           lea eax,[ebp-$0c]
004D5F55 E8624AF3FF       call @UStrClr
004D5F5A C3               ret 
004D5F5B E97840F3FF       jmp @HandleFinally
004D5F60 EBF0             jmp $004d5f52
004D5F62 8D45C4           lea eax,[ebp-$3c]
004D5F65 E8B6FEFFFF       call TStringsEnumerator.MoveNext
004D5F6A 84C0             test al,al
004D5F6C 75B0             jnz $004d5f1e

What the hell... well, I smell what is going on here... I know that the compiler generates terrible code when you are using inline with managed function results and I saw GetCurrent being marked as inline - we will change that. The generated assembly - nice and short!

Loops.dpr.66: for s in xStrings do;
004D5F1E 8D55E8           lea edx,[ebp-$18]
004D5F21 8B45F4           mov eax,[ebp-$0c]
004D5F24 E8DBFEFFFF       call TStringsHelper.GetEnumerator
004D5F29 EB0B             jmp $004d5f36
004D5F2B 8D55FC           lea edx,[ebp-$04]
004D5F2E 8D45E8           lea eax,[ebp-$18]
004D5F31 E8DAFEFFFF       call TStringsEnumerator.GetCurrent
004D5F36 8D45E8           lea eax,[ebp-$18]
004D5F39 E8EAFEFFFF       call TStringsEnumerator.MoveNext
004D5F3E 84C0             test al,al
004D5F40 75E9             jnz $004d5f2b

It would be a bit more if the enumerator was an object because the the compiler has to insert the cleanup code for the enumerator instance but it would not affect performance on this benchmark because we are only profiling the iteration and not the GetEnumerator and finalize code - that would be a different story.

Now look at the results:


BOOM! There you have it. The difference is close to 10% and it even changes depending on what loop executes first giving a bit of room for error of measurement. Applying the optimization to only read Count once also gains another 10ms. We can also remove the check on Result and always increase FIndex because when MoveNext returns false accessing Current is not permitted which gives us another few ms. Now imagine the compiler would properly inline GetCurrent we would certainly see equal or even better performance of the for-in loop in this benchmark.

Anyhow - the benefit of for-in loops over classic index based for-to loops is not only to avoid dreaded off by one errors (ever forget to -1?) but them being composable - a very important feature when developing code. You can easily add filters and transformations into the chain and then can iterate through all the items without using intermediate storage or writing nested loops - if you are using Spring4D collections you know what I am talking about, right?

So, what did we learn from this?

1. Measure and investigate code
2. Some code in the RTL is far from being ideal
3. The 32-bit compiler has much room for optimization

P.S. Hallvard did similar investigations more than a decade ago already - makes me a bit sad that the compiler is still not generating more performant code on for-in loops.

Friday, May 12, 2017

How to create an operator overload that only accepts nil

Since the introduction of Nullable types in Spring4D they had a problem: assigning null to them was cumbersome. Some time ago you could implicitly assign a Variant to them which made it possible to assign Null. However that caused some issue because of implicit Variant type conversion.

Imagine you had a Nullable<Integer> which should only allow Null or some Integer. But in fact you could assign a string to it - not good. Why was that the case? Well because it had an implicit operator for Variant to Nullable<Integer> and the compiler was too smart converting the string into a Variant to pass that to the operator overload. Result was an exception at runtime - very bad considering that Nullable<T> should be type safe at compile time.

Next idea was making an overload for Pointer to Nullable<T> but that made it possible to pass all kinds of reference types to a nullable - another source for bugs.

Before anyone asks, a Clear method was also never an option because the type should be immutable. Think of a property. If you would call Clear on that you would just clear the value returned from the getter.

For 1.2 I used an explicit type with no state and a static property Null so you can assign or compare with Nullable.Null (suggestion by Sir Rufo). An option but it always bugged me because it was not as crisp as just assigning nil like in Swift or null in C#.

Today it struck me. Use some reference type that you cannot assign anything else but nil to. How is that possible? Make it private so you cannot declare any variable of that type.

My first attempt was this:

  Nullable<T> = record
  strict private type
    Null = class end;
    class operator Implicit(const value: Null): Nullable<T>;

That kinda worked. It is impossible to declare a variable of Null and thus you cannot assign anything but nil to it. But wait. You can assign any untyped pointer to it (and declaring $TYPEDADDRESS in the unit the Nullable<T> type is declared in did not work). Well, are there any types that don't allow assigning some untyped pointer to? Yes, events or managed types like dynamic arrays or interfaces. I tried them all and if I did not miss anything it works with a private interface type - I was not able to assign any typed or untyped pointer, interface or object to it.

So this is what worked (it looks a bit different in Spring.pas because I did not put the type into the generic Nullable<T> type but somewhere else where it is not accessible except from the same unit)

  Nullable<T> = record
  strict private type
    Null = interface end;
    class operator Implicit(const value: Null): Nullable<T>;

Now (starting with Spring4D 1.2.1) you can finally simply assign nil to a Nullable<T> without losing type safety.

P.S. By default the Nullable<T> is still in compatibility mode to 1.1 which makes the implicit Variant conversion possible. Take a look into Spring.inc to disable it and make the Nullable<T> completely type safe.

Thursday, May 11, 2017

I wish I had dcc32 -dontmakemycodeslow

Quick, what is wrong with this code performance wise?

function TWhereIterator<T>.MoveNext: Boolean;
  current: T;
  Result := False;

  if fState = STATE_ENUMERATOR then
    fEnumerator := fSource.GetEnumerator;
    fState := STATE_RUNNING;

  if fState = STATE_RUNNING then
    while fEnumerator.MoveNext do
      current := fEnumerator.Current;
      if fPredicate(current) then
        fCurrent := current;
    fState := STATE_FINISHED;
    fEnumerator := nil;

Maybe you have read this article in the past (if not, feel free to read it first, I'll be waiting) and know what to look for.

There is no exception or string here but a function returning an interface. Unfortunately the compiler does not just pass the field to GetEnumerator (Remember: the result of a function where the return type is managed is actually being passed as hidden var parameter) but preserves space on the stack for a temporary variable and makes sure it is properly cleared before returning, every time the method is called. In a tight loop this causes more overhead for every call to MoveNext than the actual work being done in the STATE_RUNNING block.

Always be aware of code executed under some condition and possible variables created by the compiler. I had another case in the past where I was working with TValue in a big case statement (over TTypeKind) and I ended up with a dozen of temporary variables (for each case label) generated by the compiler, although only one of them was used each time.

I solved this by putting the STATE_ENUMERATOR block into a method. That plus moving around the state checking a bit caused this code to run almost twice as fast as before. This was the test code:

TEnumerable.Range(1, 100000000)
  function(const x: Integer): Boolean
    Result := x mod 3 = 0

Which now takes around 1 second on my machine - a for-to loop with the same condition takes around 250 ms. Not bad after all given that you hardly do that for some simple integers and a bit more complex filters. By the way it is just as fast - or slow ;) as the same code in C#.

As you might have noticed development of Spring4D 1.2.1 is ongoing and will contain some bugfixes, a few small new features and significant performance optimization for some collection use cases.

Thursday, February 23, 2017

Generics, modules and typeinfo

The good

When working in a loosely coupled software architecture you might be using different modules: packages, dynamic link libraries and typically a host executable that loads them.

If you are using a DI Container it is very easy to extend functionality by just putting that into one module and once its loaded it feeds the container all the nice things it contains and another module might consume these. That all works fine and well.

The bad

The fun begins as so often when generic types join the game. Let's say we have a generic interface that we are using and this is contained in a runtime package that two other modules require. Everything is fine and both modules can use that interface. However IList<Integer> in one of the modules is not exactly the same as IList<Integer> in the other modules because when you use a generic type the Delphi compiler compiles it for every unit you are using it in and during the linking phase all duplicates are removed. So far so good. But if you have two different modules both now contain IList<Integer> and with that bring their own typeinfo which was compiled into them. When a module is being loaded and uses runtime packages which typically includes the rtl.bpl also they get registered there. Now you have two (or more) of identical types in your application (given you compiled them from the same source state of course).

Now the DI container comes into play which identifies types by their typeinfo pointer - which as I explained is different across multiple modules.

You can test this yourself by creating two small projects, an executable and a DLL with the following code:

library MyLibrary;


procedure RegisterStuff;
    function: IList<Integer>
      Result := TCollections.CreateList<Integer>([1, 2, 3, 4, 5, 6, 7, 8, 9]);


program MyProgram;


  SysUtils, Windows,

  i: Integer;

    for i in GlobalContainer.Resolve<IList<Integer>> do
    on E: Exception do

If you execute MyProgram you will get a nice  EResolveException with message 'Cannot resolve type: IList<System.Integer>'. Now that we know what is going on this is no surprise. But how can we solve that?

The ugly

In our software I used an admittedly scary hack. If you know a bit about the DI container architecture you know that it is very extensible and that you can add so called sub dependency resolvers. These resolvers are being asked if they can resolve a requested type. There are a few builtin ones that are used to resolve dynamic arrays, lists or a TFunc of a registered type. But you can also add your own ones. I wrote a resolver that checks if the requested type could not be resolved by the container already and then looks through its known types to find one with the same full qualified name. Since we are using very explicit named units there is no chance that we accidentally have 2 different types that have the same full qualified name.

I will not post the code here to not scare you away but since others might also run into this problem (I know at least one person that was building a plugin system by using the DI container and also ran into that problem) this issue will be addressed during the development and refactoring for the DI container in version 1.3 of Spring4D.

Speaking of version 1.3 - the release of 1.2 is almost there! We are working hard to deliver it shortly after the next Delphi release which will happen ... soonish, I think. ;)

The release/1.2 branch has been around for a while - please take a look if you haven't already. It contains a lot of new features and bugfixes. I will tell you more about some of the amazing features next time which won't take as long as it took since the last blog post - promise. :)

Friday, June 26, 2015

Anonymous method overloading

A few weeks ago I learned about Knockout.js which is a very lightweight JavaScript MVVM library. It contains of some basic features which you can read about on their home page if you are interested.

I quickly was amazed how great their concept of observables works (I love design patterns on steroids). However JS is very different from Delphi code so my first version had them declared like this:
  IObservable<T> = interface
    function GetValue: T;
    procedure SetValue(const value: T);
    property Value: T read GetValue write SetValue;
So working with them looked like this:
  o: IObservable<Integer>;
  o.Value := 42;
That worked quite well but especially with more and more of these observables in my viewmodel that .Value all the time became annoying.

Because in JS functions are first class citizens you can treat them differently so in knockout if you call o() it returns the value of the observable and if you call o(42) is sets the value of the observable.

Could Delphi do the same? We know anonymous methods so we could make our IObservable<T> an anonymous method. But we can make it either a function or a procedure - not both. Yes, we can!

Anonymous methods are implemented as interface. So in fact TFunc<T> is the same as an interface with a method returning T. And that method is called Invoke. We can even inherit interfaces from an anonymous method type making them an anonymous method themselves. However the compiler prevents you from calling a method on them because in Delphi the parentheses are not required for a call. So what if we inherit from TFunc<T> and add an overload to Invoke?

We now have a type that looks like this:
  IObservable<T> = interface(TFunc<T>)
    procedure Invoke(const value: T); overload;
Now we can write code like this:
  o: IObservable<Integer>;
Now that might look a bit strange at first but once you understand the concept this is really amazing.
For more information on my KnockoutJS inspired prototype check out the Knockoff project on Bitbucket. It contains some basic examples and tests to show how the observable dependency tracking works and how they can be used to do binding on UI controls. Though keep in mind that it is only a research project so far - but showing much potential.

Please let me know what you think.

Thursday, May 7, 2015

Extending the Spring4D DI container - lifetime managers

Today we got the request to support TInterfacedPersistent in the DI container.

As you may know TInterfacedPersistence does not do reference counting like TInterfacedObject does - or to be more precise it delegates it to its possible owner. Usually the owner will be nil so it does not do reference counting through its _AddRef and _Release method and often is used for classes that should not do reference counting if you don't want to write your own class (or use Spring.TInterfaceBase for that purpose).

When working with DI your life will be much easier when using interfaces. But if the implementing class is of TInterfacedPersistence you might have a problem - in form of a memory leak. The container will create the instance and unless you have registered it as singleton (which means the container will only ever create one instance and return this whenever asking for it) not hold a reference to this instance. Because of the missing reference counting it will never destroyed if the interface reference goes out of scope.

"Then he needs to implement reference counting into that class" my coworker said, when I told him about that request. Good idea but there are a few problems with that. The _AddRef and _Release methods in TInterfacedPersistence are not virtual so you can only "override" them by implementing IInterface again. "Easy enough!" you might say. Yes, but that will only cause reference counting when you query IInterface from the instance but not any other interface that you implemented (and which you more likely will use than IInterface). So you had to re-implement all the other interfaces as well. This will not only add to the instance size (every implemented interface causes the size of each instance to grow by SizeOf(Pointer) - did you know that?) but might not be possible because some implemented methods could be private which would cause the compiler to complain about a missing implementation of the interface.

So, long story short. If you want to make a class that inherits from TInterfacedPersistent reference counted you have to take another approach - we are talking about a class that you cannot modify for whatever reason: provide an owner that does the reference counting and takes care of destroying it. Usually TInterfacedPersistent looks for the owner in its AfterConstruction method. So what we do is look if FOwnerInterface is not already set and then assign an owner to it - but how, it is private? Fortunately there is a trick using a class helper (also could have used RTTI because that has access to private fields by default) - here is the code:

  TInterfacedPersistentHelper = class helper for TInterfacedPersistent
  private type
    TOwner = class(TInterfacedObject)
      FOwnedInstance: TObject;
      procedure AfterConstruction; override;
      constructor Create(const instance: TInterfacedPersistent);
      destructor Destroy; override;
    procedure EnableRefCount;

procedure TInterfacedPersistentHelper.EnableRefCount;
  Assert(not Assigned(Self.FOwnerInterface));
  Self.FOwnerInterface := TOwner.Create(Self);

constructor TInterfacedPersistentHelper.TOwner.Create(
  const instance: TInterfacedPersistent);
  inherited Create;
  FOwnedInstance := instance;

destructor TInterfacedPersistentHelper.TOwner.Destroy;

procedure TInterfacedPersistentHelper.TOwner.AfterConstruction;
  // assigning to FOwnerInterface will increment this to 0
  FRefCount := -1;

Now how do we get that into our container? The easiest way would be to use the DelegateTo method in the registration:

container.RegisterType<IFoo, TFoo>.DelegateTo(
  function: TFoo
    Result := TFoo.Create;

But what if TFoo would require arguments in its constructor? We would need to Resolve them from the container and inject them. But don't we use the container to get rid of all the construction work?

So with a minor refactoring - isn't it great when things are easy to change without breaking the entire thing - I added support for custom lifetime managers - I assume I don't have to explain what they do. Let's write our own for handling our TInterfacedPersistent classes - we inherit from the TTransientLifetimeManager from Spring.Container.LifetimeManager:

  TInterfacedPersistentLifetimeManager = class(TTransientLifetimeManager)
    procedure DoAfterConstruction(const instance: TValue); override;

procedure TInterfacedPersistentLifetimeManager.DoAfterConstruction(
  const instance: TValue);

And now we register our type with it (the API is not yet final the method name might change to something more explicit):

container.RegisterType<IFoo, TFoo>.AsCustom<TInterfacedPersistentLifetimeManager>

Now that is nice already. But what if the container could figure out when a class inherits from TInterfacedPersistent and register this lifetime manager? No problem - we need to add an IBuilderInspector to the container for that purpose. What these do is inspect the registered type for certain aspects (there are built-in ones looking for things like what interfaces does a class implement and register them as services if not already explicitly specified or look for members with special attributes). Ours looks like this:

  TInterfacedPersistentBuilderInspector = class(TInterfacedObject, IBuilderInspector)
    procedure ProcessModel(const kernel: IKernel; const model: TComponentModel);

procedure TInterfacedPersistentBuilderInspector.ProcessModel(const kernel: IKernel;
  const model: TComponentModel);
  if model.ComponentType.IsInstance
    and model.ComponentType.AsInstance.MetaclassType.InheritsFrom(TInterfacedPersistent)
    and (model.LifetimeType = TLifetimeType.Transient) then
    model.LifetimeManager := TInterfacedPersistentLifetimeManager.Create(model);

And we need to attach that to the container before calling Build (I suggest doing that as the first thing when setting up the container):


But wait, there is more - the container supports so called extensions. These extensions can be attached to the container to add certain behaviors or features (like adding support for decorator detection or changing the way the container selects constructors to create instances). In our case it is really simple as it just adds one component to the container.

  TInterfacedPersistentLifetimeExtension = class(TContainerExtension)
    procedure Initialize; override;

procedure TInterfacedPersistentLifetimeExtension.Initialize;

And attaching the extension is also a one liner.


So in the end with one line we can add support for making TInterfacedPersistent classes reference counted for using them in our DI container powered application.

"I love it when a plan comes together" ;)

P.S. with the same technique you can add reference counting to TComponent, by attaching an IVCLComObject interface to it.

Saturday, February 21, 2015

Type less - how to use GExperts macro templates

With all these articles and presentations about the PPL you might have seen this often. Calling TThread.Queue (or TThread.Synchronize if you are doing it wrong ;)) to execute your VCL or FMX related code in the main thread.

Are you still typing this code wasting precious time? Well then this article is for you!

I see this so often in presentations by people that should know better given their many years of Delphi experience. And to be honest - I am guilty of that too. I could spend way less time writing stupid code (with stupid code I usually refer to boilerplate code like this).

Disclaimer: I tried using live templates also but man that sucked and invoking the macro always killed the selection in the code editor.

Open the GExperts macro templates (default shortcut is Shift+Alt+T) and click on configuration.

After you click OK you enter the following code into the editor (with a trailing line break):


So it looks like this:

The %SELECTION% tells the macro to insert the selected text here and the pipe tells it to set the cursor to this position after.

Click on OK now and select some source you want to invoke in the main thread.
Press the shortcut for macro templates, type queue and press enter, boom, done.

More time for awesome code or creating more of these templates!
(or to watch funny clips on the internet ...) ;)