Michael Nicht - WikiCommons

This blog post is about a pesky leak I had troubles figuring out, and what was behind it.

Contents

The problem

It was discovered that the new native transports in kbmMW leaked kbmMW Socket instances upon each close and opening of a socket.

The base class TkbmMWAbstractSocket descends from TInterfacedObject, and implements IkbmMWSocket and is thus reference counted.

The kbmMW native client transport contained a private field FSocket:IkbmMWSocket which means that whenever I assign an instance of a descendant of TkbmMWAbstractSocket (in this case on Windows a TkbmMWIOCPSocket) to it, it will automatically be reference counted, and whenever I assign nil to FSocket, its reference count should be decreased, and at that point be freed, since no other parts of the code ought to reference it at that point in time.

However it did not get freed, and at closer look, it did have a reference count greater than 1 at the time of nil‘ing the FSocket field.

Ok that should be straight forward… I just need to figure out where something is increasing the reference count. Unfortunately it became a bit more difficult to track than I hoped for.

The search

I decided to use my older license of AQTime (great tool, miserable and expensive license and unfortunately not being developed on for many years now) to track the allocation and deallocation, but I did not really succeed in finding the problem.

However kbmMW also have its own memory debugging tool, TkbmMWDebugMemory (see this blogpost), which I made because being tired of the lack of development for the rather expensive yearly maintenance license for AQTime.

kbmMW’s memory debugging tool can also track allocation, reallocation and deallocation of memory (both GetMem/FreeMem and Global, Local and Heap allocations/deallocations) and when it is freed, and can make stack dumps of where memory was allocated in the application etc. making it easier to figure out why a leak exists.

And it told me the same I already knew… that I was leaking instances of TkbmMWIOCPSocket. But I knew that the problem had to do with reference counting, and the tool did not support tracking reference counting.

It does now!

So now I can follow all the places where references was added, until the instance is completely freed when all references are removed. It gave me a bit more input to where to search for the problem leading to:

The reason

Obviously, something was adding references to the reference counted instance. And it turned out to be within the IOCP sockets thread pool that is allocated for IOCP use.

Each thread had a private field, FSocket:IkbmMWSocket, which were set to the socket instance that the thread is belonging to. Not so unusual. And the private field was not declared [weak].

Further there were code in the TkbmMWIOCPSocket‘s BeforeDestruction method (that is called upon the destruction of the TkbmMWIOCPSocket instance), that were supposed to stop those threads, and nil their references to the socket instance.

However we have got a catch 22 in this case, because BeforeDestruction is not going to be called, until the reference count goes down to 0, and the reference count will not go down to zero because BeforeDestruction is not called, as it was in there the threads were being terminated.

Ok… solution is to put [weak] before the FSocket:IkbmMWSocket declaration.in the thread class.

But then run… and …. crash!

The threads were not allowed to start up, because somehow the thread field FSocket suddenly became nil, although it had just been assigned shortly before when the thread instance was created, which were right after the socket instance itself was created.

More accurately, the threads were created in the TkbmMWAbstractSocket.AfterConstruction method, which is a method that is called after an instance of an object has been created using its constructor, but before the construction returns to the calling code.

So we have code like this (pseudocode):

TkbmMWSomeThread = class(TThread)
private
   [weak]
   FSocket:IkbmMWSocket;
public
   constructor Create(ASocket:IkbmMWSocket,...);
   destructor Destroy; override;
end;

TkbmMWAbstractSocket = class(TInterfacedObject,IkbmMWSocket)
protected
   procedure AllocateThreads;
   procedure DeallocateThreads;
public
   procedure AfterConstruction; override;
   procedure BeforeDestruction; override;
...
end;

...

constructor TkbmMWSomeThread.Create(ASocket:IkbmMWSocket,...);
begin
     inherited Create(true);
     FSocket:=ASocket;
end;

...

procedure TkbmMWAbstractSocket.AfterConstruction;
begin
   inherited AfterConstruction;
   AllocateThreads;
end;

procedure TkbmMWAbstractSocket.BeforeDestruction;
begin
   DeallocateThreads;
   inherited BeforeDestruction;
end;

So we have AllocateThreads which will do something like this:

procedure TkbmMWAbstractSocket.AllocateThreads;
begin
   FThread:=TkbmMWSomeThread.Create(self,...);
   FThread.Start;
end;

procedure TkbmMWAbstractSocket.DeallocateThreads;
begin
     FThread.Stop;
     FThread.Free;
end;

All looking pretty good and valid… except it is crashing.. .badly, when FThread.Start is executed. Obviously the thread is using the FSocket instance in its execution loop, alas it has become NIL at that time!

That caused a bit of headache, and were probably the reason why I had not put [weak] on the FSocket instance in the first place, where everything was then working… but unfortunately leaking!

Which leads to:

The solution

The solution is to change the AfterConstruction method to look like this:

procedure TkbmMWAbstractSocket.AfterConstruction;
begin
   AllocateThreads;
   inherited AfterConstruction;
end;

In other words move the inherited part to the end of the method.

Usually it is good practice to call inherited methods first, then do your own stuff afterwards, unless there is a very good reason not to do it in that order, which there are in this situation.

The reason is that, because it is an interfaced object instance (descending from TInterfacedObject), the construction of the object instance will automatically set its reference count to 1 very early on, to avoid it being destructed prematurely.

However since some variable will be set to the newly created instance (the field FSocket in TkbmMWIOCPSocket), the reference will be automatically increased since it is an interface copy situation. Assigning an interface to a variable will always (unless you cast it to a pointer type first), be an interface copy, which will increment the reference count.

So that would leave the reference count to 2 after having created the instance and assigned it to the variable. That is obviously wrong, since it should only be 1, as there is only one variable referencing the object instance’s interface.

Delphi solves that by putting a reference decrement (but without destruction requirement testing) in the AfterConstruction method of TInterfacedObject.

So by calling AfterConstruction BEFORE calling AllocateThreads with the FSocket argument, will result in the reference count of the reference counted object pointed to in FSocket to be 0.

It will obviously be incremented (due to interface copy) when being passed in as an argument to the AllocateThreads method, but the moment the AllocateThreads is exited, the reference count of FSocket will again be decremented (the regular way) and be tested for if it has reached 0, after which the IkbmMWSocket instance will be destroyed, resulting in a major crash.

So the outcome of this, is be very careful when doing something relating to reference counted objects within their AfterConstruction method. Make sure to call inherited as late as possible in the overridden variant of it.

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.