Monday, September 7, 2020

Open array parameters and subranges

Open array parameters in Delphi are one of the unique language constructs in Delphi. While their syntax resembles that of a dynamic array they are a different thing. In fact under the hood they consist of two parameters, the first being the pointer to the first item and the second being the highest index.

You can pass static and dynamic arrays to an open array parameter as well as directly specify values to be passed using the square brackets. This sums up what they are but the documentation has some more details about them. There is also a detailed article about them.

Open array parameters are a great way to work with sub ranges without additionally passing start and end index or start index and length which avoids any possible defects by wrong offset calculation.

If you want to take a subrange of such an array you can use the Slice function - which however has two little deficiencies. It does not work on dynamic arrays and what this article is about: it can only return subranges that start at the beginning.

Imagine the following code - for the sake of this article we take a standard recursive mergesort implementation and try to partially implement it by using an open array.

procedure MergeSort(var values: array of Integer);
begin
  if Length(values) <= 1 then
    Exit;

  MergeSort(<left half of values>);

  MergeSort(<right half of values>);

  // actual merge left as an exercise to the reader
end;

The left half is easy as you can just use Slice:

  MergeSort(Slice(values, Length(values) div 2);

But the right side must not start at the beginning and gives us a problem. The question is not new and has already been asked on stackoverflow and shows a way how to trick the compiler into generating the correct code - keep in mind we don't want to copy parts of the array but implement an inplace merge sort here - the sort itself does not matter but just serves to demonstrate getting a subrange.

We aim for getting the best code being generated while keeping as much readability as possible - so any fancy solutions using anonymous methods are off limits. The only solution to the problem is to trick Slice in thinking it is dealing with a static array (which the questioner tried to avoid if possible). Because we need it later let's declare a type for this:

type
  TOpenArray<T> = array[0..0] of T;

As this is a generic type it can be used for any open array later - for now we are just sorting Integer. With its help we can complete the trick and outwit Slice:

  len := Length(values);
  mid := len div 2;

  MergeSort(Slice(TOpenArray<Integer>(values[mid]), len - mid));

Apart from introducing variables for the length and the middle of the array we are now taking the start of the right half (if the length is uneven the right side will be one element longer) and hardcasting this to our one-sized static array of Integer and passing the length of the right half. If you compile and run this code with some data it will eventually raise an ERangeError because the compiler assumes that this array has only one element and we want to pass more than that.

So we have to surround that code with directives to temporarily disable range checking if that is enabled - I usually write the code as follows:

  len := Length(values);
  mid := len div 2;

  {$IFOPT R+}{$R-}{$DEFINE RANGECHECKS_ON}{$ENDIF}
  MergeSort(Slice(TOpenArray<Integer>(values[mid]), len - mid));
  {$IFDEF RANGECHECKS_ON}{$R+}{$UNDEF RANGECHECKS_ON}{$ENDIF}

If you are using something such as jedi.inc you already get the RANGECHECKS_ON define and can check for that and don't have to undef - anyhow the goal is to not get this exception - of course you have to make sure that the length you are passing here is correct!

So far so good we have our Mergesort implementation and could implement the interesting part, the merging simply with an array that starts at 0 and has Length(values) elements - no tedious offset calculation. How you do that though is out of the scope for this article.

Now let's make it generic!

class procedure TArray.MergeSort<T>(var values: array of T);
var
  len, mid: Integer;
begin
  len := Length(values);
  if len <= 1 then
    Exit;

  mid := len div 2;

  MergeSort<T>(Slice(values, mid));

  {$IFOPT R+}{$R-}{$DEFINE RANGECHECKS_ON}{$ENDIF}
  MergeSort<T>(Slice(TOpenArray<T>(values[mid]), len - mid));
  {$IFDEF RANGECHECKS_ON}{$R+}{$ENDIF}

  // actual merge left as an exercise to the reader
end;

When we try to compile this code we will get an error in line 14: E2089 Invalid typecast - ugh... generics. Obviously for whatever reason the compiler here does not allow us to hardcast that one element of type T (values[mid]) into a one sized static array of type T (is that a bug given the types have an exact match? Should I report this? Let me know). Anyhow we can solve this - and that might be an indicator that the compile error is bogus:

  MergeSort<T>(Slice(TOpenArray<T>((@values[mid])^), len - mid));

Simply using the address and dereferencing satisfies the compiler to compile - and generate the correct code. Speaking of which - let's take a look at it - I am using optimization and turned off and range and overflow checking to just get the essence:

MergeSortDemo.dpr.44: begin
0041CA34 53               push ebx
0041CA35 56               push esi
0041CA36 51               push ecx
0041CA37 890424           mov [esp],eax
MergeSortDemo.dpr.45: len := Length(values);
0041CA3A 8D5A01           lea ebx,[edx+$01]
MergeSortDemo.dpr.46: if len <= 1 then
0041CA3D 83FB01           cmp ebx,$01
0041CA40 7E24             jle $0041ca66
MergeSortDemo.dpr.49: mid := len div 2;
0041CA42 8BF3             mov esi,ebx
0041CA44 D1FE             sar esi,1
0041CA46 7903             jns $0041ca4b
0041CA48 83D600           adc esi,$00
MergeSortDemo.dpr.54: MergeSort<T>(Slice(values, mid));
0041CA4B 8BD6             mov edx,esi
0041CA4D 4A               dec edx
0041CA4E 8B0424           mov eax,[esp]
0041CA51 E8DEFFFFFF       call TArray.MergeSort<System.Integer>
MergeSortDemo.dpr.59: MergeSort<T>(Slice(TOpenArray<T>((@values[mid])^), len - mid));
0041CA56 8BD3             mov edx,ebx
0041CA58 2BD6             sub edx,esi
0041CA5A 4A               dec edx
0041CA5B 8B0424           mov eax,[esp]
0041CA5E 8D04B0           lea eax,[eax+esi*4]
0041CA61 E8CEFFFFFF       call TArray.MergeSort<System.Integer>
MergeSortDemo.dpr.63: end;
0041CA66 5A               pop edx
0041CA67 5E               pop esi
0041CA68 5B               pop ebx
0041CA69 C3               ret 

Not much to complain about here - the compiler generated just the right code - sure it could have used a register to store eax in instead of using the stack and thus avoid one mov but that is what it usually does in generics. But for the important use of Slice it properly calculates the offset in the array and passes that to MergeSort.

I stated in the introduction that open array parameters are a unique language feature in Delphi - however many other languages have better support for subranges of arrays or strings - and in fact I only came to the solution being presented here because I was looking how to improve the implementation of introsort in Spring4D when I saw how .Net Core utilizes its Span<T>.

Would it be nice to have Slice work without that workaround and have a more powerful way to specify ranges on arrays? Yes and I might file another issue on QP about it - but in the meantime using open array parameters is a nice way to work with subranges of arrays without unnecessary offset calculation.

Update: I previously omitted the generic type parameter on the MergeSort call in the generic implementation which seems to work properly in Delphi 10.4.1 but causes some endless loop in the compiler in earlier versions. Updated the code to not use type inference but explicitly specify the generic type parameter.

2 comments: