Getting the Object Behind the Interface — By Reading Machine Code

How kbmMW’s memory leak debugger identifies which object is behind a leaked interface reference — by reverse-engineering the compiler’s own assembly instructions at runtime.

Contents

1 The Problem: “Something Is Leaking, but What?”
2 Why Is This Hard?
3 The Compiler Already Knows the Offset
4 How It Works, Step by Step
5 Two Instruction Sizes
6 The Bigger Picture: How the Leak Debugger Uses This
7 Why This Technique for This Job?
8 The Safety Net
9 Platform Limitation: x86 / x64 Only
10 An Analogy
11 Summary

The Problem: “Something Is Leaking, but What?”

kbmMW includes a comprehensive memory debugging system. The kbmMWDebugMemory unit hooks into Delphi’s memory manager and tracks every allocation — objects, strings, raw data blocks, even Windows API allocations like VirtualAlloc and HeapAlloc. It also hooks _AddRef and _Release on IInterface itself to track interface reference counts. When your application shuts down (or whenever you ask), it produces a detailed report of everything still alive — your leaks.

For objects, the report is straightforward. The debugger checks if a pointer looks like a valid TObject (by verifying the VMT self-pointer), and if so, calls .ClassName. A typical leak report for objects looks like this:

42) Object TkbmMWScheduledEvent Addr:006A3F20 Size:256
43) Object TStringList Addr:006B1280 Size:64

Immediately useful. You know what leaked and can search your code for it.

But interface references are a different story. The debugger has hooked _AddRef and _Release, so it knows a certain interface pointer still has outstanding references when it shouldn’t. But the leak report would say:

44) Interface(UNKNOWN) Addr:006C8040 RefCount:1

Not very helpful. You know something leaked through an interface, but you have no idea what class is behind it. In a large framework with hundreds of interfaces, “UNKNOWN” turns debugging into a needle-in-a-haystack exercise.

What the debugger wants to report is:

44) Interface(TkbmMWHTTPServerTransport) Addr:006C8040 RefCount:1

Now you know exactly where to look. The question is: how do you get from a raw interface pointer to a class name?

Why Is This Hard?

To understand why this isn’t a simple typecast, we need to peek at how Delphi lays out an object that implements interfaces in memory:

An interface pointer doesn’t point to the start of the object. It points to a slot in the middle. To call .ClassName, we need to find offset +0.

The interface pointer the leak debugger captured points to offset +24 inside the object — the slot where that particular interface’s vtable pointer lives. The TObject (and its .ClassName) lives at offset +0. The gap between them is different for every class, because it depends on how many fields and other interfaces were declared before this one.

Casting the pointer directly to TObject would interpret random field data as a VMT pointer — crash or garbage. The debugger needs to know the exact offset. And that’s where the machine code trick comes in.

The Compiler Already Knows the Offset

The Delphi compiler already solves this exact problem for every interface method call. It just doesn’t expose the answer in a way your code can access.

When you call a method through an interface, the CPU passes the interface pointer as the Self parameter. But the actual method expects Self to be the object pointer. The compiler bridges this gap by generating a thunk — a tiny stub of machine code that adjusts Self by the correct offset, then jumps to the real method.

The thunk adjusts Self by a known offset before jumping to the real method. That offset IS the distance from interface pointer to object start.

That adjustment value in the thunk — the -24 — is exactly the information the debugger needs. kbmMWGetImplementingObject works by reading the machine code bytes of the thunk directly from memory and extracting that value.

How It Works, Step by Step

Given a raw interface pointer, the function navigates through several layers of indirection to reach the thunk and decode it:

Four pointer dereferences and one pattern-match. The function reads raw CPU instruction bytes from the thunk and extracts the self-adjustment value.

The function declares a packed record that maps directly onto the byte layout of the machine code instruction. It reads the first bytes of the QueryInterface thunk and checks for two known instruction encodings. On x64, it looks for $48 $83 $C1 (the ADD RCX, signed_byte form for small offsets) or $48 $81 $C1 (ADD RCX, signed_longint for larger ones). On 32-bit x86, the patterns are different because Self is passed on the stack rather than in a register.

Two Instruction Sizes

Why two patterns? The CPU has a compact form of ADD that encodes small offsets (–128 to +127) in one byte, and a longer form for full 32-bit integers. Since most objects implement only a few interfaces, the offset is usually small and the compiler picks the compact form. But the function handles both.

The opcode bytes (blue) identify the instruction. The offset bytes (red) are what we extract.

The Bigger Picture: How the Leak Debugger Uses This

Let’s trace the full lifecycle of a leaked interface reference through the debugging system:

The vtable hack is only used in the final reporting step — turning an opaque interface address into a human-readable class name.

The usage pattern is identical at all three call sites in the code (log output, TStrings output, and TStream output): call kbmMWGetImplementingObject, validate the result with IsObject, and if valid, grab the class name. If the result fails validation — maybe the object was already freed — the report prints “UNKNOWN” instead.

Why This Technique for This Job?

Why not use a cleaner approach? Several alternatives exist, and each has problems that make it unsuitable for a memory leak debugger specifically:

Adding a GetObject method to every interface — architecturally clean, but a leak debugger can’t require every interface in the application to implement a special method. The point is to detect leaks in any code, including third-party libraries.

Using RTTI — RTTI lookups involve string comparisons and memory allocation. In a leak debugger running at shutdown — potentially while the memory manager is half torn down — triggering new allocations is risky and could interfere with the very leaks you’re detecting.

The Delphi as operator — requires the interface to support it, and internally goes through a method call that may not be safe on a leaked, potentially partially destroyed object.

Storing the class name at _AddRef time — the debugger could resolve the object when _AddRef is called. But _AddRef fires millions of times during normal execution. Adding class-resolution overhead to every reference count change would tank performance. By deferring resolution to the single report pass at shutdown, runtime cost is effectively zero.

A surgical tool, not a Swiss Army knife. This technique exists in kbmMW for exactly one purpose: producing useful leak reports for interface references. It’s not used anywhere else in the framework. It’s a diagnostic tool that only runs during the shutdown report — never in production hot paths.

The Safety Net

A leak debugger probes memory in an uncertain state by definition. Some leaked objects might be partially destroyed. Some pointers might be stale. The function defends against this in three layers:

First, the pattern-matching case statement has an else branch: if the bytes don’t match either ADD instruction form, it returns nil.

Second, the entire function is wrapped in try...except. If reading the thunk bytes triggers an access violation, the exception is caught and the function returns nil.

Third, the caller runs the result through IsObject, which uses VirtualQuery to check the memory is readable, then verifies the VMT self-pointer — a characteristic Delphi signature where the VMT contains a pointer back to itself at a known offset. Only if all checks pass does the report use .ClassName.

Platform Limitation: x86 / x64 Only

The function is conditionally compiled with {$IF DEFINED(CPUX32) or DEFINED(CPUX64)}. It only exists on Intel/AMD, because the trick depends on recognizing the exact byte patterns the Delphi compiler emits for thunks on x86/x64. On ARM, the thunks use different instructions entirely. On those platforms, leaked interfaces are reported as “UNKNOWN” — less informative, but still functional. The trade-off is acceptable since the debug memory system is a development-time tool, and Delphi development primarily happens on Windows.

An Analogy

Imagine you’re a building inspector reviewing apartments with overdue rent (the leaks). For most apartments, the building directory tells you the tenant’s name (regular objects — just call .ClassName). But some are listed under a company name — “Suite 24, Some Interface LLC” (interface references — no class name).

However, you know that every suite door has a tiny plaque installed by the building contractor (the thunk) that says “this suite is 24 meters from the main entrance.” So you walk to the door, read the plaque, walk 24 meters toward the entrance, find the building’s main registry, and look up the actual person behind the company.

That’s what the debugger does — reads the “contractor’s plaque” (thunk machine code) to navigate from the interface back to the object.

Summary

kbmMWGetImplementingObject exists for a single, specific purpose: making the kbmMW memory leak debugger’s reports actionable when interface references are involved. Instead of “UNKNOWN”, it reports the exact class name of the object behind the leak.

It achieves this by reading the compiler-generated thunk code for QueryInterface, extracting the self-adjustment offset embedded in the machine code, and applying it to navigate from the interface pointer back to the TObject. The technique is fast (a few pointer dereferences), safe (multiple validation layers), and surgical (used only during leak reporting, never in production code).

It’s the kind of code most developers will never need to write — but if you’ve ever stared at a leak report full of anonymous interface addresses and wished it would just tell you the class name, you’ll appreciate what it does.

Should you copy this technique? Probably not, unless you’re building diagnostic tooling with similar constraints (no allocation, no RTTI, no cooperation from the interface). It’s coupled to the Delphi compiler’s x86/x64 thunk format and would need updating if that format changes. The try/except safety net means it fails gracefully, but code that reads machine code should be treated with the respect that deserves.

The kbmMW framework is developed by Kim Bo Madsen at Components4Developers.

Reverse-Engineering Delphi for Effective Debugging

Bykimbomadsen

Getting the Object Behind the Interface — By Reading Machine Code

The Problem: “Something Is Leaking, but What?”

Why Is This Hard?

The Compiler Already Knows the Offset

How It Works, Step by Step

Two Instruction Sizes

The Bigger Picture: How the Leak Debugger Uses This

Why This Technique for This Job?

The Safety Net

Platform Limitation: x86 / x64 Only

An Analogy

Summary

Related Posts:

By kimbomadsen

Related Post

Lock-Free Hash Arrays in kbmMW — A Practical Guide

Taming Delphi’s Unit Initialization Order — A Dependency Graph Approach

kbmMW WIB #3 – Using the TkbmMWMultithreadMessageQueueProcessor

Leave a Reply Cancel reply

You missed

Lock-Free Hash Arrays in kbmMW — A Practical Guide

Reverse-Engineering Delphi for Effective Debugging

Exploring the Swiss Army Knife: kbmMW’s Red-Black Tree

Taming Delphi’s Unit Initialization Order — A Dependency Graph Approach