Runtime Code Patching - Not for the Faint of Heart

I have been involved in several conversations recently that have revolved around the joys of runtime code patching. I am always shocked to hear people say that they are ok with this idea of code patching at runtime. Moreover – it shocks me that they think it is easy to get right! I do think that code patching can have its place in a system if it is implemented correctly and if its intent and semantics are fully disclosed to any potential users.

So what exactly is code patching? There are tons of examples of it on the web (mostly hacker sites! :D ) and many books (again – mostly hacker stuff) that describe it in detail – just “Live Search” “runtime code patching” , but basically patching is the process of replacing a set of op codes in memory with a different set of op codes – at runtime. This seems simple enough but the devil is in the details. Let’s dig in!

Consider these op codes from a simple and mostly useless function:

   01001175 8bff mov edi,edi

   01001177 53 push ebx

   01001178 50 push eax

   01001179 53 push ebx

   0100117a 51 push ecx

   0100117b 52 push edx

   0100117c 5a pop edx

   0100117d 59 pop ecx

   0100117e 5b pop ebx

   0100117f 58 pop eax

   01001180 5b pop ebx

   01001181 c3 ret

This code is kind of silly – but it has nice properties that expose the problems with patching, more on that in a bit. It is also made up of very commonly generated op codes. The format of the assembly listing is:

 [address] [op codes] [mnemonics for op codes]

 

The typical reason for installing a patch is to either bypass or modify the behavior of an existing function. Therefore the canonical patch is “op codes for a jmp to an absolute address”, written over the beginning of an existing routine like so:

 

   01001175 ea871100011b00 jmp 001b:01001187

   0100117c 5a pop edx

   0100117d 59 pop ecx

   0100117e 5b pop ebx

   0100117f 58 pop eax

   01001180 5b pop ebx

   01001181 c3 ret

 

Here we replaced the first few instructions of our function with an absolute jump to some other code, presumably code that we wrote and loaded into the system in some way.

 

So now that we know what a patch is – what is the problem? This seems like it will work? What exactly is it that we need to worry about? Well, the most obvious thing we have to worry about is another thread running the code that we are patching, while we are patching it. The reason we need to worry about this is because if we modify the code that another thread is running, while it is running it – it will crash, or at the very least do very strange things. We need to make sure that any thread running the code we are patching either executes all of the old op codes or just the new one – but never a combination of the two. You can imagine a scenario like this:

 

1. Thread T1 executes the first instruction from the old op codes at address 0x01001175

2. The old op codes are overwritten with the new "jump" op code, by thread T2

3. Thread T1 executes the op code at 0x01001177, which is now an address in the middle of the jump instruction that T2 wrote over old op codes

 

This would be really bad! J

 

So what to do? The next logical progression in analyzing the issue goes like this:

 

“Well, the problem is that we have a race with other threads running the code we are patching, so I’ll just make sure that no other threads are in that code when I do the patch!”.

 

 Perfect! Well, not quite. Although we can corral all the other running processors (using DPCs, IPIs, etc.) and make sure that none of the running threads are in the code we are going to patch. But believe it or not this still isn’t enough to make our patch solid! To clarify, let’s restate the problem with our patching approach: we need to synchronize with all of the other threads currently running the code we are tying to patch. However, the latest incarnation of methodology, will only cause us to synchronize with all the threads currently running on other processors. What about the threads that aren’t running? Is there any synchronization required there? Actually and unfortunately – yes. The problem is – we can have threads that aren’t currently running but that have the address of the old op codes, where our new instruction is now residing, saved away for future use. This can occur if a thread was context swapped while executing the code we want to patch (when a thread is context swapped, the current instruction pointer is saved away so that it can be restored when the thread runs again). Other things that can cause the instruction pointer to be saved away are: exceptions, interrupts, etc..

 

So what is the moral of the story here? Don’t patch code? Well that may be a little extreme, but the moral is at least to never patch multiple instructions. Working in the Windows Online Crash Analysis (OCA ) data for a long time now, I can tell you that I have seen many failed attempts at doing this correctly that have ended in a BSOD. J

 

In fact, if you look at post Server 2003 Windows binaries, you will notice that the generated code follows this format almost exclusively:

 

7731a321 90 nop

7731a322 90 nop

7731a323 90 nop

7731a324 90 nop

7731a325 90 nop

 

7731a326 8bff mov edi,edi

7731a328 55 push ebp

7731a329 8bec mov ebp,esp

… <more op codes here>

This type of code generation allows for a patch to be installed safely at runtime. The 2 bytes at the start of the function (mov edi, edi) is enough space to hold the op code for a “relative short jump”, which be crafted to jump to the address of the 5 bytes of nop op codes. These bytes would have been previously overwritten with a “jmp ADDRESS” op code. This was a conscious change that was made to allow servicing of existing binaries on long running machines, without having to reboot to replace the binaries. The same rules apply though – the other processors must be corralled in order to make sure no thread is active in the code we want to patch at the time the patch is applied. Again – the reason this methodology works is because the op code boundaries match. In other words, either code will execute the “mov edi, edi” or the “short relative jump”. If it executes the former, then the routine runs normally through the code. If it executes the latter, then it will jump to the patch installed over the nop op codes. No problem here. There are issues with the instruction caches on processors as well, but those issues can be managed within the corralling mechanism. One last point is, we don’t have to worry about the “normal path” code running the patch that was written over the nops because there is no path to that code except via the “short relative jump” that we wrote over the first instruction. Nice huh?!

 

Well, I hope this has been somewhat interesting or useful in some way. Please let me know if there are any other things that would be interesting to talk about.