Thursday, May 11, 2017

I wish I had dcc32 -dontmakemycodeslow

Quick, what is wrong with this code performance wise?

function TWhereIterator<T>.MoveNext: Boolean;
var
  current: T;
begin
  Result := False;

  if fState = STATE_ENUMERATOR then
  begin
    fEnumerator := fSource.GetEnumerator;
    fState := STATE_RUNNING;
  end;

  if fState = STATE_RUNNING then
  begin
    while fEnumerator.MoveNext do
    begin
      current := fEnumerator.Current;
      if fPredicate(current) then
      begin
        fCurrent := current;
        Exit(True);
      end;
    end;
    fState := STATE_FINISHED;
    fEnumerator := nil;
  end;
end;

Maybe you have read this article in the past (if not, feel free to read it first, I'll be waiting) and know what to look for.

There is no exception or string here but a function returning an interface. Unfortunately the compiler does not just pass the field to GetEnumerator (Remember: the result of a function where the return type is managed is actually being passed as hidden var parameter) but preserves space on the stack for a temporary variable and makes sure it is properly cleared before returning, every time the method is called. In a tight loop this causes more overhead for every call to MoveNext than the actual work being done in the STATE_RUNNING block.

Always be aware of code executed under some condition and possible variables created by the compiler. I had another case in the past where I was working with TValue in a big case statement (over TTypeKind) and I ended up with a dozen of temporary variables (for each case label) generated by the compiler, although only one of them was used each time.

I solved this by putting the STATE_ENUMERATOR block into a method. That plus moving around the state checking a bit caused this code to run almost twice as fast as before. This was the test code:

TEnumerable.Range(1, 100000000)
  .Where(
  function(const x: Integer): Boolean
  begin
    Result := x mod 3 = 0
  end)
  .Count;

Which now takes around 1 second on my machine - a for-to loop with the same condition takes around 250 ms. Not bad after all given that you hardly do that for some simple integers and a bit more complex filters. By the way it is just as fast - or slow ;) as the same code in C#.

As you might have noticed development of Spring4D 1.2.1 is ongoing and will contain some bugfixes, a few small new features and significant performance optimization for some collection use cases.