Monday, September 7, 2020

Open array parameters and subranges

Open array parameters in Delphi are one of the unique language constructs in Delphi. While their syntax resembles that of a dynamic array they are a different thing. In fact under the hood they consist of two parameters, the first being the pointer to the first item and the second being the highest index.

You can pass static and dynamic arrays to an open array parameter as well as directly specify values to be passed using the square brackets. This sums up what they are but the documentation has some more details about them. There is also a detailed article about them.

Open array parameters are a great way to work with sub ranges without additionally passing start and end index or start index and length which avoids any possible defects by wrong offset calculation.

If you want to take a subrange of such an array you can use the Slice function - which however has two little deficiencies. It does not work on dynamic arrays and what this article is about: it can only return subranges that start at the beginning.

Imagine the following code - for the sake of this article we take a standard recursive mergesort implementation and try to partially implement it by using an open array.

procedure MergeSort(var values: array of Integer);
begin
  if Length(values) <= 1 then
    Exit;

  MergeSort(<left half of values>);

  MergeSort(<right half of values>);

  // actual merge left as an exercise to the reader
end;

The left half is easy as you can just use Slice:

  MergeSort(Slice(values, Length(values) div 2);

But the right side must not start at the beginning and gives us a problem. The question is not new and has already been asked on stackoverflow and shows a way how to trick the compiler into generating the correct code - keep in mind we don't want to copy parts of the array but implement an inplace merge sort here - the sort itself does not matter but just serves to demonstrate getting a subrange.

We aim for getting the best code being generated while keeping as much readability as possible - so any fancy solutions using anonymous methods are off limits. The only solution to the problem is to trick Slice in thinking it is dealing with a static array (which the questioner tried to avoid if possible). Because we need it later let's declare a type for this:

type
  TOpenArray<T> = array[0..0] of T;

As this is a generic type it can be used for any open array later - for now we are just sorting Integer. With its help we can complete the trick and outwit Slice:

  len := Length(values);
  mid := len div 2;

  MergeSort(Slice(TOpenArray<Integer>(values[mid]), len - mid));

Apart from introducing variables for the length and the middle of the array we are now taking the start of the right half (if the length is uneven the right side will be one element longer) and hardcasting this to our one-sized static array of Integer and passing the length of the right half. If you compile and run this code with some data it will eventually raise an ERangeError because the compiler assumes that this array has only one element and we want to pass more than that.

So we have to surround that code with directives to temporarily disable range checking if that is enabled - I usually write the code as follows:

  len := Length(values);
  mid := len div 2;

  {$IFOPT R+}{$R-}{$DEFINE RANGECHECKS_ON}{$ENDIF}
  MergeSort(Slice(TOpenArray<Integer>(values[mid]), len - mid));
  {$IFDEF RANGECHECKS_ON}{$R+}{$UNDEF RANGECHECKS_ON}{$ENDIF}

If you are using something such as jedi.inc you already get the RANGECHECKS_ON define and can check for that and don't have to undef - anyhow the goal is to not get this exception - of course you have to make sure that the length you are passing here is correct!

So far so good we have our Mergesort implementation and could implement the interesting part, the merging simply with an array that starts at 0 and has Length(values) elements - no tedious offset calculation. How you do that though is out of the scope for this article.

Now let's make it generic!

class procedure TArray.MergeSort<T>(var values: array of T);
var
  len, mid: Integer;
begin
  len := Length(values);
  if len <= 1 then
    Exit;

  mid := len div 2;

  MergeSort<T>(Slice(values, mid));

  {$IFOPT R+}{$R-}{$DEFINE RANGECHECKS_ON}{$ENDIF}
  MergeSort<T>(Slice(TOpenArray<T>(values[mid]), len - mid));
  {$IFDEF RANGECHECKS_ON}{$R+}{$ENDIF}

  // actual merge left as an exercise to the reader
end;

When we try to compile this code we will get an error in line 14: E2089 Invalid typecast - ugh... generics. Obviously for whatever reason the compiler here does not allow us to hardcast that one element of type T (values[mid]) into a one sized static array of type T (is that a bug given the types have an exact match? Should I report this? Let me know). Anyhow we can solve this - and that might be an indicator that the compile error is bogus:

  MergeSort<T>(Slice(TOpenArray<T>((@values[mid])^), len - mid));

Simply using the address and dereferencing satisfies the compiler to compile - and generate the correct code. Speaking of which - let's take a look at it - I am using optimization and turned off and range and overflow checking to just get the essence:

MergeSortDemo.dpr.44: begin
0041CA34 53               push ebx
0041CA35 56               push esi
0041CA36 51               push ecx
0041CA37 890424           mov [esp],eax
MergeSortDemo.dpr.45: len := Length(values);
0041CA3A 8D5A01           lea ebx,[edx+$01]
MergeSortDemo.dpr.46: if len <= 1 then
0041CA3D 83FB01           cmp ebx,$01
0041CA40 7E24             jle $0041ca66
MergeSortDemo.dpr.49: mid := len div 2;
0041CA42 8BF3             mov esi,ebx
0041CA44 D1FE             sar esi,1
0041CA46 7903             jns $0041ca4b
0041CA48 83D600           adc esi,$00
MergeSortDemo.dpr.54: MergeSort<T>(Slice(values, mid));
0041CA4B 8BD6             mov edx,esi
0041CA4D 4A               dec edx
0041CA4E 8B0424           mov eax,[esp]
0041CA51 E8DEFFFFFF       call TArray.MergeSort<System.Integer>
MergeSortDemo.dpr.59: MergeSort<T>(Slice(TOpenArray<T>((@values[mid])^), len - mid));
0041CA56 8BD3             mov edx,ebx
0041CA58 2BD6             sub edx,esi
0041CA5A 4A               dec edx
0041CA5B 8B0424           mov eax,[esp]
0041CA5E 8D04B0           lea eax,[eax+esi*4]
0041CA61 E8CEFFFFFF       call TArray.MergeSort<System.Integer>
MergeSortDemo.dpr.63: end;
0041CA66 5A               pop edx
0041CA67 5E               pop esi
0041CA68 5B               pop ebx
0041CA69 C3               ret 

Not much to complain about here - the compiler generated just the right code - sure it could have used a register to store eax in instead of using the stack and thus avoid one mov but that is what it usually does in generics. But for the important use of Slice it properly calculates the offset in the array and passes that to MergeSort.

I stated in the introduction that open array parameters are a unique language feature in Delphi - however many other languages have better support for subranges of arrays or strings - and in fact I only came to the solution being presented here because I was looking how to improve the implementation of introsort in Spring4D when I saw how .Net Core utilizes its Span<T>.

Would it be nice to have Slice work without that workaround and have a more powerful way to specify ranges on arrays? Yes and I might file another issue on QP about it - but in the meantime using open array parameters is a nice way to work with subranges of arrays without unnecessary offset calculation.

Update: I previously omitted the generic type parameter on the MergeSort call in the generic implementation which seems to work properly in Delphi 10.4.1 but causes some endless loop in the compiler in earlier versions. Updated the code to not use type inference but explicitly specify the generic type parameter.

Thursday, September 3, 2020

TestInsight 1.1.9.0 released

A new version of TestInsight is available for Delphi 10.1 and higher including the just released Delphi 10.4.1 - older Delphi versions can still use version 1.1.5.0.

Download for 1.1.9.0 (10.1 or higher) or 1.1.5.0 (XE to 10.0)


It contains mostly bugfixes (some had already been fixed in 1.1.8.0):
  • div by zero exception when trying to discover tests on DUnitX
  • truncated duration column for larger numbers: the format of the duration has been improved and you can resize the column now
  • an AV could occur in certain situations when closing the IDE with a docked TestInsight window
  • clipboard shortcuts did not work in the filter edit
  • vertical scrollbar did not have the correct size
But also some nice improvements:
  • run test at the cursor from the interface section of the unit now works (not for individual testcase attributes yet though, it executes all for that particular method) - you have to be in the line of the method declaration
  • navigating from a test result to its implementation now works if the method is implemented by a parent class

Monday, November 19, 2018

Spring4D news and announcement


Today I would like to inform you about the recent development and some future plans of the library.

As you might have seen we have been busy in various branches. The 1.2.2 branch contains many bugfixes and small improvements and will also contain Delphi 10.3 support. It will be released once Delphi 10.3 does.

The next version after that will be 1.3 which will contain a large refactoring of the collections to make them faster and smaller in size. There are also many new collection types - more about them in a future post. Originally this release should also contain a refactoring of the DI container but as that is another huge task I felt that holding back any changes even longer was not a good idea. So the next version after 1.3 will contain that. The release of 1.3 is scheduled for early 2019.


I am very happy to announce Spring4D conf which will be on april 17th and 18th in Bergamo, Italy. You will get two days packed with information about Spring4D and meet the authors and experts to learn more about the countless ways to improve your software - from the basic building blocks to advanced scenarios using powerful features like interception and dependency injection.

So please head over to the conference page and order your tickets today!

Tuesday, July 17, 2018

Loop wars - measuring performance

Today on Google+ there was some off topic discussion going on and the claim was being made that for-in loops are slower than for-to loops and some benchmark was being posted to back that argument.

Because I like to measure and investigate things I looked into that. I was running the following benchmark. Yes, it is a micro benchmark - let's not argue about the usefulness of this in isolation.

Keep in mind that testing code in the dpr main influences results because all variables are then global instead of local variables - that is why I put such code into some routine.

procedure Main;
var
  xStrings: TStrings;
  i: Integer;
  s: string;
  sw: TStopwatch;
begin
  xStrings := TStringList.Create;
  for i := 1 to 10000000 do
    xStrings.Add(i.ToString);

  sw := TStopwatch.StartNew;
  for i := 0 to xStrings.Count - 1 do
    s := xStrings[i];
  Writeln(sw.ElapsedMilliseconds);

  sw := TStopwatch.StartNew;
  for s in xStrings do;
  Writeln(sw.ElapsedMilliseconds);

  Readln;
end;

Running this on my machine here gave me following average results across multiple runs (release config):

210
470

During the for-in loop I am not assigning s to some other variable which was done in the originally posted code because I already have the value for each iteration, doing that again would result in one more string assignment in the for-in loop - be fair when comparing both.

Looks like one loop is more than twice as fast as the other - let's investigate why that is.

The TStringsEnumerator being used here is a class which comes with some overhead for its object allocation but only once for the GetEnumerator call. It might be of significance when measuring many loop iterations but in this case we are not. Anyway, we are going change it, copy the code from Classes.pas and change it to a record and use a helper for TStrings to easily get our enumerator being used. This is the added code:

type
  TStringsEnumerator = record
  private
    FIndex: Integer;
    FStrings: TStrings;
  public
    function GetCurrent: string; inline;
    function MoveNext: Boolean;
    property Current: string read GetCurrent;
  end;

  TStringsHelper = class helper for TStrings
    function GetEnumerator: TStringsEnumerator;
  end;

{ TStringsHelper }

function TStringsHelper.GetEnumerator: TStringsEnumerator;
begin
  Result.FIndex := -1;
  Result.FStrings := Self;
end;

{ TStringsEnumerator }

function TStringsEnumerator.GetCurrent: string;
begin
  Result := FStrings[FIndex];
end;

function TStringsEnumerator.MoveNext: Boolean;
begin
  Result := FIndex < FStrings.Count - 1;
  if Result then
    Inc(FIndex);
end;

Looking at the results again - nothing significant changed but that was expected.

The generated assembly of the for-to loop:

Loops.dpr.60: for i := 0 to xStrings.Count - 1 do
004D5EB8 8B45F0           mov eax,[ebp-$10]
004D5EBB 8B10             mov edx,[eax]
004D5EBD FF5214           call dword ptr [edx+$14]
004D5EC0 8BD8             mov ebx,eax
004D5EC2 4B               dec ebx
004D5EC3 85DB             test ebx,ebx
004D5EC5 7C1C             jl $004d5ee3
004D5EC7 43               inc ebx
004D5EC8 C745EC00000000   mov [ebp-$14],$00000000
Loops.dpr.61: s := xStrings[i];
004D5ECF 8D4DFC           lea ecx,[ebp-$04]
004D5ED2 8B55EC           mov edx,[ebp-$14]
004D5ED5 8B45F0           mov eax,[ebp-$10]
004D5ED8 8B30             mov esi,[eax]
004D5EDA FF560C           call dword ptr [esi+$0c]
004D5EDD FF45EC           inc dword ptr [ebp-$14]
Loops.dpr.60: for i := 0 to xStrings.Count - 1 do
004D5EE0 4B               dec ebx
004D5EE1 75EC             jnz $004d5ecf

You can see that there are in fact 2 loop variables being used, one counting down to zero from xStrings.Count (stored in ebx), one counting up being used for accessing the items (stored on the stack at [ebp-$14]). The Count property is only read once. Anyhow changing the code in the enumerator to only read Count once and store that internally will not change things drastically.

The assembly of the for-in loop - oh wow:

Loops.dpr.67: for s in xStrings do;
004D5F11 8D55C4           lea edx,[ebp-$3c]
004D5F14 8B45F0           mov eax,[ebp-$10]
004D5F17 E8E8FEFFFF       call TStringsHelper.GetEnumerator
004D5F1C EB44             jmp $004d5f62
004D5F1E 33C0             xor eax,eax
004D5F20 55               push ebp
004D5F21 685B5F4D00       push $004d5f5b
004D5F26 64FF30           push dword ptr fs:[eax]
004D5F29 648920           mov fs:[eax],esp
004D5F2C 8D4DF4           lea ecx,[ebp-$0c]
004D5F2F 8B55C4           mov edx,[ebp-$3c]
004D5F32 8B45CC           mov eax,[ebp-$34]
004D5F35 8B18             mov ebx,[eax]
004D5F37 FF530C           call dword ptr [ebx+$0c]
004D5F3A 8D45FC           lea eax,[ebp-$04]
004D5F3D 8B55F4           mov edx,[ebp-$0c]
004D5F40 E89F4EF3FF       call @UStrLAsg
004D5F45 33C0             xor eax,eax
004D5F47 5A               pop edx
004D5F48 59               pop ecx
004D5F49 59               pop ecx
004D5F4A 648910           mov fs:[eax],edx
004D5F4D 68625F4D00       push $004d5f62
004D5F52 8D45F4           lea eax,[ebp-$0c]
004D5F55 E8624AF3FF       call @UStrClr
004D5F5A C3               ret 
004D5F5B E97840F3FF       jmp @HandleFinally
004D5F60 EBF0             jmp $004d5f52
004D5F62 8D45C4           lea eax,[ebp-$3c]
004D5F65 E8B6FEFFFF       call TStringsEnumerator.MoveNext
004D5F6A 84C0             test al,al
004D5F6C 75B0             jnz $004d5f1e

What the hell... well, I smell what is going on here... I know that the compiler generates terrible code when you are using inline with managed function results and I saw GetCurrent being marked as inline - we will change that. The generated assembly - nice and short!

Loops.dpr.66: for s in xStrings do;
004D5F1E 8D55E8           lea edx,[ebp-$18]
004D5F21 8B45F4           mov eax,[ebp-$0c]
004D5F24 E8DBFEFFFF       call TStringsHelper.GetEnumerator
004D5F29 EB0B             jmp $004d5f36
004D5F2B 8D55FC           lea edx,[ebp-$04]
004D5F2E 8D45E8           lea eax,[ebp-$18]
004D5F31 E8DAFEFFFF       call TStringsEnumerator.GetCurrent
004D5F36 8D45E8           lea eax,[ebp-$18]
004D5F39 E8EAFEFFFF       call TStringsEnumerator.MoveNext
004D5F3E 84C0             test al,al
004D5F40 75E9             jnz $004d5f2b

It would be a bit more if the enumerator was an object because the the compiler has to insert the cleanup code for the enumerator instance but it would not affect performance on this benchmark because we are only profiling the iteration and not the GetEnumerator and finalize code - that would be a different story.

Now look at the results:

210
240

BOOM! There you have it. The difference is close to 10% and it even changes depending on what loop executes first giving a bit of room for error of measurement. Applying the optimization to only read Count once also gains another 10ms. We can also remove the check on Result and always increase FIndex because when MoveNext returns false accessing Current is not permitted which gives us another few ms. Now imagine the compiler would properly inline GetCurrent we would certainly see equal or even better performance of the for-in loop in this benchmark.

Anyhow - the benefit of for-in loops over classic index based for-to loops is not only to avoid dreaded off by one errors (ever forget to -1?) but them being composable - a very important feature when developing code. You can easily add filters and transformations into the chain and then can iterate through all the items without using intermediate storage or writing nested loops - if you are using Spring4D collections you know what I am talking about, right?

So, what did we learn from this?

1. Measure and investigate code
2. Some code in the RTL is far from being ideal
3. The 32-bit compiler has much room for optimization

P.S. Hallvard did similar investigations more than a decade ago already - makes me a bit sad that the compiler is still not generating more performant code on for-in loops.

Update (16.06.2019): It looks like that in recent versions  the inline got removed from the GetCurrent function bringing the TStringsEnumerator in range of 10% to the for-to loop.

Friday, May 12, 2017

How to create an operator overload that only accepts nil

Since the introduction of Nullable types in Spring4D they had a problem: assigning null to them was cumbersome. Some time ago you could implicitly assign a Variant to them which made it possible to assign Null. However that caused some issue because of implicit Variant type conversion.

Imagine you had a Nullable<Integer> which should only allow Null or some Integer. But in fact you could assign a string to it - not good. Why was that the case? Well because it had an implicit operator for Variant to Nullable<Integer> and the compiler was too smart converting the string into a Variant to pass that to the operator overload. Result was an exception at runtime - very bad considering that Nullable<T> should be type safe at compile time.

Next idea was making an overload for Pointer to Nullable<T> but that made it possible to pass all kinds of reference types to a nullable - another source for bugs.

Before anyone asks, a Clear method was also never an option because the type should be immutable. Think of a property. If you would call Clear on that you would just clear the value returned from the getter.

For 1.2 I used an explicit type with no state and a static property Null so you can assign or compare with Nullable.Null (suggestion by Sir Rufo). An option but it always bugged me because it was not as crisp as just assigning nil like in Swift or null in C#.

Today it struck me. Use some reference type that you cannot assign anything else but nil to. How is that possible? Make it private so you cannot declare any variable of that type.

My first attempt was this:

type
  Nullable<T> = record
  strict private type
    Null = class end;
  public
    class operator Implicit(const value: Null): Nullable<T>;
  end;

That kinda worked. It is impossible to declare a variable of Null and thus you cannot assign anything but nil to it. But wait. You can assign any untyped pointer to it (and declaring $TYPEDADDRESS in the unit the Nullable<T> type is declared in did not work). Well, are there any types that don't allow assigning some untyped pointer to? Yes, events or managed types like dynamic arrays or interfaces. I tried them all and if I did not miss anything it works with a private interface type - I was not able to assign any typed or untyped pointer, interface or object to it.

So this is what worked (it looks a bit different in Spring.pas because I did not put the type into the generic Nullable<T> type but somewhere else where it is not accessible except from the same unit)

type
  Nullable<T> = record
  strict private type
    Null = interface end;
  public
    class operator Implicit(const value: Null): Nullable<T>;
  end;

Now (starting with Spring4D 1.2.1) you can finally simply assign nil to a Nullable<T> without losing type safety.

P.S. By default the Nullable<T> is still in compatibility mode to 1.1 which makes the implicit Variant conversion possible. Take a look into Spring.inc to disable it and make the Nullable<T> completely type safe.

Thursday, May 11, 2017

I wish I had dcc32 -dontmakemycodeslow

Quick, what is wrong with this code performance wise?

function TWhereIterator<T>.MoveNext: Boolean;
var
  current: T;
begin
  Result := False;

  if fState = STATE_ENUMERATOR then
  begin
    fEnumerator := fSource.GetEnumerator;
    fState := STATE_RUNNING;
  end;

  if fState = STATE_RUNNING then
  begin
    while fEnumerator.MoveNext do
    begin
      current := fEnumerator.Current;
      if fPredicate(current) then
      begin
        fCurrent := current;
        Exit(True);
      end;
    end;
    fState := STATE_FINISHED;
    fEnumerator := nil;
  end;
end;

Maybe you have read this article in the past (if not, feel free to read it first, I'll be waiting) and know what to look for.

There is no exception or string here but a function returning an interface. Unfortunately the compiler does not just pass the field to GetEnumerator (Remember: the result of a function where the return type is managed is actually being passed as hidden var parameter) but preserves space on the stack for a temporary variable and makes sure it is properly cleared before returning, every time the method is called. In a tight loop this causes more overhead for every call to MoveNext than the actual work being done in the STATE_RUNNING block.

Always be aware of code executed under some condition and possible variables created by the compiler. I had another case in the past where I was working with TValue in a big case statement (over TTypeKind) and I ended up with a dozen of temporary variables (for each case label) generated by the compiler, although only one of them was used each time.

I solved this by putting the STATE_ENUMERATOR block into a method. That plus moving around the state checking a bit caused this code to run almost twice as fast as before. This was the test code:

TEnumerable.Range(1, 100000000)
  .Where(
  function(const x: Integer): Boolean
  begin
    Result := x mod 3 = 0
  end)
  .Count;

Which now takes around 1 second on my machine - a for-to loop with the same condition takes around 250 ms. Not bad after all given that you hardly do that for some simple integers and a bit more complex filters. By the way it is just as fast - or slow ;) as the same code in C#.

As you might have noticed development of Spring4D 1.2.1 is ongoing and will contain some bugfixes, a few small new features and significant performance optimization for some collection use cases.

Thursday, February 23, 2017

Generics, modules and typeinfo

The good

When working in a loosely coupled software architecture you might be using different modules: packages, dynamic link libraries and typically a host executable that loads them.

If you are using a DI Container it is very easy to extend functionality by just putting that into one module and once its loaded it feeds the container all the nice things it contains and another module might consume these. That all works fine and well.

The bad

The fun begins as so often when generic types join the game. Let's say we have a generic interface that we are using and this is contained in a runtime package that two other modules require. Everything is fine and both modules can use that interface. However IList<Integer> in one of the modules is not exactly the same as IList<Integer> in the other modules because when you use a generic type the Delphi compiler compiles it for every unit you are using it in and during the linking phase all duplicates are removed. So far so good. But if you have two different modules both now contain IList<Integer> and with that bring their own typeinfo which was compiled into them. When a module is being loaded and uses runtime packages which typically includes the rtl.bpl also they get registered there. Now you have two (or more) of identical types in your application (given you compiled them from the same source state of course).

Now the DI container comes into play which identifies types by their typeinfo pointer - which as I explained is different across multiple modules.

You can test this yourself by creating two small projects, an executable and a DLL with the following code:

library MyLibrary;

uses
  Spring.Container,
  Spring.Collections;

procedure RegisterStuff;
begin
  GlobalContainer.RegisterType<IList<Integer>>.DelegateTo(
    function: IList<Integer>
    begin
      Result := TCollections.CreateList<Integer>([1, 2, 3, 4, 5, 6, 7, 8, 9]);
    end).AsSingleton;
  GlobalContainer.Build;
end;

begin
  RegisterStuff;
end.


program MyProgram;

{$APPTYPE CONSOLE}

uses
  SysUtils, Windows,
  Spring.Container,
  Spring.Collections;

var
  i: Integer;
begin
  LoadLibrary('MyLibrary.dll');

  try
    for i in GlobalContainer.Resolve<IList<Integer>> do
      Writeln(i);
  except
    on E: Exception do
      Writeln(E.Message);
  end;
  Readln;
end.

If you execute MyProgram you will get a nice  EResolveException with message 'Cannot resolve type: IList<System.Integer>'. Now that we know what is going on this is no surprise. But how can we solve that?

The ugly

In our software I used an admittedly scary hack. If you know a bit about the DI container architecture you know that it is very extensible and that you can add so called sub dependency resolvers. These resolvers are being asked if they can resolve a requested type. There are a few builtin ones that are used to resolve dynamic arrays, lists or a TFunc of a registered type. But you can also add your own ones. I wrote a resolver that checks if the requested type could not be resolved by the container already and then looks through its known types to find one with the same full qualified name. Since we are using very explicit named units there is no chance that we accidentally have 2 different types that have the same full qualified name.

I will not post the code here to not scare you away but since others might also run into this problem (I know at least one person that was building a plugin system by using the DI container and also ran into that problem) this issue will be addressed during the development and refactoring for the DI container in version 1.3 of Spring4D.

Speaking of version 1.3 - the release of 1.2 is almost there! We are working hard to deliver it shortly after the next Delphi release which will happen ... soonish, I think. ;)

The release/1.2 branch has been around for a while - please take a look if you haven't already. It contains a lot of new features and bugfixes. I will tell you more about some of the amazing features next time which won't take as long as it took since the last blog post - promise. :)