Saturday, April 27, 2013

Using ARC with non iOS compiler - possible?

Did you ever have the situation with 2 lists sharing the same objects and possibly one or both had the OwnsObjects property set to true? Especially when using interfaced lists like in Spring4d or DSharp you want to make use of reference couting and automatic memory management. But what if some instances are shared between different lists or moved from one list to the other. You cannot use the Delete or Remove method and then add the object to the other list of vice versa if the first list has OwnsObjects true because it will destroy the instance when it gets removed. That is why there is the Extract method which removes the instance but does not destroy it. This can get complicated and possibly leading to memleaks or exceptions quickly.

Using interfaced objects for simple data storages might not be a very good idea. It requires you to write interfaces with the properties just for that purpose and if you use some kind of RTTI based binding you cannot do that as there is no RTTI for interface properties (which are just syntax sugar anyway).

So what would we give for some easy memory management in these cases. How about using the concept we know from interfaces and other types for objects?

Of course we could write some TRefCountObject and inherit our data classes from that base class and handle these in our Notify method inside the list when items get added or removed. But that would be to easy and not magic at all. ;) And more seriously it does not always work to change the base type due to several reasons.

So what do we need? Basically just a new field inside our object to keep track of the reference count. Keep in mind we cannot do a full ARC implementation because that would include assignments and parameter passing which would need compiler support. We just want it for when objects are put into lists.

The method that is responsible for allocating the memory of new instances is TObject.NewInstance. So we need to replace that:

procedure InitializeARC;
var
  Buffer: array[0..4] of Byte;
begin
  Buffer[0] := $E9
  // redirect TObject.NewInstance
  PInteger(@Buffer[1])^ := PByte(@NewInstance) - (PByte(@TObject.NewInstance) + 5);
  WriteMemory(@TObject.NewInstance, @Buffer, 5);
end;

What this code does is place a jump instruction at the very beginning of the TObject.NewInstance method that redirects it to our NewInstance routine which looks like this:

function NewInstance(Self: TClass): TObject;
begin
  // get additional memory for the RefCount field
  GetMem(Pointer(Result), Self.InstanceSize + SizeOf(Integer));
  Result := InitInstance(Self, Result);
end;

It does basically the same as the original except that it allocates 4 bytes more for our RefCount field and then calls our version of InitInstance (which is responsable for initializing the object):

function InitInstance(Self: TClass; Instance: Pointer): TObject;
const
  Buffer: Pointer = @BeforeDestruction;
begin
  Result := Self.InitInstance(Instance);

  // initialize the RefCount field
  GetRefCountFieldAddress(Instance)^ := 0;

  // replace TObject.BeforeDestruction
  if PPointer(NativeInt(Self) + vmtBeforeDestruction)^ = @TObject.BeforeDestruction then
    WriteMemory(PPointer(NativeInt(Self) + vmtBeforeDestruction), @Buffer, SizeOf(Pointer));
end;

Since TObject.InitInstance just zeroes the memory the RTL knows about (obtained by calling InstanceSize) we need to set our field which sits on the last 4 bytes in our instance:

function GetRefCountFieldAddress(Instance: TObject): PInteger; inline;
begin
  // the RefCount field was added last
  Result := PInteger(NativeInt(Instance) + Instance.InstanceSize);
end;

Along with the reference couting we want to make sure that the instance is not getting destroyed when it is still managed by the RefCount (because it sits in some list). That is why the BeforeDestruction method gets replaced. Why not detour like NewInstance? The implementation in TObject is empty so there are not 5 bytes of available that we can overwrite to jump to our implementation. But as it is virtual we can replace it in the classes VMT. Like its implementation in TInterfacedObject it will raise an error when the RefCount is not 0.

procedure BeforeDestruction(Self: TObject);
begin
  if GetRefCount(Self) <> 0 then
    System.Error(reInvalidPtr);
end;

Implementing the actual AddRef and Release routines is pretty easy aswell:

function __ObjAddRef(Instance: TObject): Integer;
begin
  Result := InterlockedIncrement(GetRefCountFieldAddress(Instance)^);
end;

function __ObjRelease(Instance: TObject): Integer;
begin
  Result := InterlockedDecrement(GetRefCountFieldAddress(Instance)^);
  if Result = 0 then
    Instance.Destroy;
end;

The most important thing: You need to add the unit which contains this as the very first unit in your project  (or after ShareMem) so the NewInstance method gets patched as soon as possible.

Time to test if it does what it should:

implementation

{$R *.dfm}

uses
  DSharp.Collections,
  DSharp.Core.ARC;

type
  TList<T: class> = class(DSharp.Collections.TList<T>)
  protected
    procedure Notify(const Value: T; const Action: TCollectionChangedAction); override;
  end;

procedure TList<T>.Notify(const Value: T;
  const Action: TCollectionChangedAction);
begin
  case Action of
    caAdd: __ObjAddRef(Value);
    caRemove: __ObjRelease(Value);
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  list1, list2: IList<TObject>;
begin
  list1 := TList<TObject>.Create;
  list2 := TList<TObject>.Create;

  list1.Add(TObject.Create);
  list1.Add(TObject.Create);
  list2.AddRange(list1);
  list1.Delete(1);
end;

initialization
  ReportMemoryLeaksOnShutdown := True;

end.

When we click the button both objects get added to both lists and the last list containing an object will cause it to get destroyed when removed (which happens if the list gets destroyed aswell).

So far this is more of a proof of concept but I think this can make some code easier and less complicated especially when working a lot with lists and moving around objects without knowing what list at the end owns the objects.

You can find that code in the svn repository and as always your feedback is welcome.

Tuesday, April 23, 2013

Why no extension methods in Delphi?

I have been wondering this for a long time now. Why does Delphi not have this great language feature? Sure, we have helpers but they are not the same. First they are still officially considered a feature you should not use when designing new code. However it opens up some interesting possibilities - some might call it hacks... but that is a topic for another day. Second they don't work on interfaces or generic types. And interestingly though that is what I want to talk about today.

But first - if you don't know it already - I suggest you read what an extension method is - really I could not explain it any better. Oh, and to all readers that want to jump the "stop the dotnetification of Delphi - keep these new language features away" wagon - this post is not for you, sorry.

Ever had a class or a set of classes you wanted to add some functionality to? Sure, there are ways to do so like the decorator pattern. But did you see the problem there if you have a type you cannot inherit from because either you cannot modify the code or it's not a class but an interface? Well, then create a new interface and add that functionality there, someone might say. How, if you cannot extend the given type? Use the adapter or bridge pattern? You can see where this is going. You might end having to change existing code or introduce lots of code to apply your additional functionality.

The most prominent example of extension methods (and not surprisingly the reason they were introduced in C# 3.0) are the extension methods for IEnumerable<T>. If you want to use the foreach (or for..in loop in Delphi) all you have to implement is the GetEnumerator method (and actually the only method that IEnumerable<T> got). So if you ever need to implement that in some of your classes you implement just one method and got access to almost any query operation you can imagine - not saying they all make sense in every context, but you get the idea.

Extension methods are great. You don't clutter your class with things that don't belong there directly but apply to an aspect of your class (in our case being enumerable). They follow good principles like the dependency inversion principle. The way you are using them is more natural and makes more sense than having static methods (or routines) where you pass in the instance you want to call the method on as first parameter.

Even without the fancy LINQ Syntax without question it is much more readable to write

for c in customers.Where(HasBillsToPay).OrderBy<string>(GetCompanyName) do
  Writeln(c.CompanyName);

instead of

for c in EnumerableHelper.OrderBy<string>(
  EnumerableHelper.Where(customers, HasBillsToPay), GetCompanyName) do
  Writeln(c.CompanyName);

And that statement has only two chained calls - imagine how that grows in length if you got a more complex query with grouping or something else. Also in that case it is the order on how the query gets processed - easier to read and to write.

But - you remember - no helpers for interfaces and generics! Well, we can implement these methods in our base TEnumerable<T> class and/or put it on our IEnumerable<T> interface, no? Yes, we can. And everything would be fine if there wasn't one tiny detail - how generics are implemented in Delphi and how the compiler handles them: it generates a type for every specialization. Which means the same code compiled for every possible T in your application, for TCustomer, TOrder, TCategory and so on. Only with a small set of methods implemented (and possible classes for more complex operations like GroupBy for example) this means you get hundreds of KB added for each TList<T> you will ever use - even if you never touch these methods. That is because the linker cannot remove any method inside an interface even if never called.

So how to work around that problem (which is what I have been doing in the Spring4D refactoring branch lately)? Let's take a look again on how extension methods are defined. Nothing prevents us from creating that syntax in Delphi, so a Where method could look like this:

type
  Enumerable = record
    class function Where<TSource>(source: IEnumerable<T>;
      predicate: TPredicate<TSource>): IEnumerable<T>;
  end;

Simple, isn't it? But how to call it? We need a little trick here, let's see:

type
  Enumerable<T> = record
  private
    fThis: IEnumerable<T>;
  public
    function GetEnumerator: IEnumerator<T>;

    function Where(predicate: TPredicate<TSource>): Enumerable<T>

    class operator Implicit(const value: IEnumerable<T>): Enumerable<T>;
  end;

As you can see we use a record type that wraps the interface we want to extend and add the method there. We can implement the "extension methods" there or direct the call to our extension method type if we want to keep it seperatly.

We now have a nice way to do our query just like we wrote it above if customers where from Enumerable<T>. Or we can perform a cast (since we have an implicit operator that will get used). Also notice how the result of the Where method is of the record type. That way we can chain the calls easily. And because we implemented GetEnumerator we can use it in a for..in loop just like any IEnumerable<T>.

What's also nice about the record type is that the linker can now be smart and remove any method that we never call and save us dozens of megabytes in our binary (not kidding).

So our life could be so much easier if we had extension methods (or call them helper) for interfaces and generic types. But as long as we don't have that, we have to find some clever workarounds.

If you are a Spring4D user, check out the changes in the refactoring branch and let me know what you think.