Porting a 500K+ LOC application to a new platform is an exhaustive process.
No amount of typedefs and #defines will save you from the inevitable crashes and memory corruptions that will occur when you first run your code.
Sure, good practices will save you a ton of time and QA effort in the end, but there are certain subtleties that even the best among us will miss when writing version 1.0.
I fixed a crash in some production code yesterday that reared its ugly head when compiled to target a 64-bit platform.
The bug itself resides in a relatively simple function which rotates or flips an image, in a section of code that calculates a per-row pointer increment when flipping an image vertically.
The offending code is shown below, see if you can spot the problem:
typedef unsigned int some_type size_t bytespp = 3; some_type width = 637; ptrdiff_t incr = -1 * width * bytespp;
If you don’t see it right away we can clear things up a bit by removing the typedefs. Note that these declarations apply only to the x86 version (i.e., when the crash does not occur). The x64 version is further down.
unsigned int bytespp = 3; unsigned int width = 637; int incr = -1 * width * bytespp;
Now it should be more clear. We’re multiplying an unsigned type by -1 to get a negative pointer increment.
While definitely not the best way to go about things, it works on a 32-bit platform and never caused any problems.
The result is -1911, as one would expect.
However, moving into 64-bit land produces a different result entirely. The variable incr takes on a huge, positive value all of a sudden and we (luckily!) crash with a segfault.
So I fixed the bug as shown below and scratched one more x64 pain in the ass off of my list.
// yes yes, truncation may occur due to the // cast to a signed __int64 at the end, but // in practice we will never deal with images // that have a width >= 2^63. ptrdiff_t incr = -(ptrdiff_t)(m_width * bytesPerPixel);
That being done, I was interested to know exactly why the code worked in a 32-bit build and was completely borked under x64.
The answer is probably obvious to anyone who has a good understanding of type promotions and signed to unsigned conversions in C, but a deeper look proves to be an interesting exercise for anyone who may not see the answer right away.
For reference, here is the un-typedef’d version of the code when compiling for x64:
unsigned int width = 637; unsigned __int64 bytespp = 3; __int64 incr = -1 * width * bytespp;
Now, in order to really understand why this all works out in an x86 build we’ll want to take a peak at the disassembly.
The x86 version
This assembly is pretty straightforward, but we’ll go trough it step-by-step to make sure everyone’s on the same page.
unsigned int bytespp = 3; 004D11C0 mov dword ptr [ebp-18h],3 unsigned int width = 637; 004D11C7 mov dword ptr [ebp-24h],27Dh int incr = -1 * width * bytespp; 004D11CE mov eax,dword ptr [ebp-24h] 004D11D1 imul eax,eax,0FFFFFFFFh 004D11D4 imul eax,dword ptr [ebp-18h] 004D11D8 mov dword ptr [ebp-30h],eax
004D11C0 mov dword ptr [ebp-18h],3
Move a value of 3 into the stack location for bytespp.
004D11C7 mov dword ptr [ebp-24h],27Dh
Move a value of 0x27D (637 decimal) into the stack location for width.
004D11CE mov eax,dword ptr [ebp-24h]
Move the value of width into eax.
004D11D1 imul eax,eax,0FFFFFFFFh
Multiply eax by -1 (0xFFFFFFFF in Two’s Compliment) and store the result in eax.
The mathematical result of this operation is 0x27CFFFFFD83, but as eax is a 32-bit register, it is truncated to 32-bits, i.e., 0xFFFFFD83. Obviously not logically correct at this point, but we have more to do.
004D11D4 imul eax,dword ptr [ebp-18h]
Now multiply eax (which is holding the result of the first multiplication) by the value of bytespp, which is 3, and store the result (0x2FFFFF889) in eax.
This results in 12884899977 decimal. However, as we noted earlier, the result of multiplying two 32-bit integers is itself a 32-bit integer, so when all is said and done the result is truncated to 0xFFFFF889, or -1911 decimal, which is correct.
This code has been merrily chugging along for years. In a 32-bit environment it just so happens to all work out at the end and you can’t really blame the original developer for writing a bug that is relatively subtle.
Now let’s take a look at the x64 version and see exactly where it bit me in the ass sometime around 11a.m. yesterday morning.
The x64 version
unsigned __int64 bytespp = 3; 000000014000111A mov qword ptr [rsp+30h],3 unsigned int width = 637; 0000000140001123 mov dword ptr [rsp+38h],27Dh __int64 result = -1 * width * bytespp; 000000014000112B mov eax,dword ptr [rsp+38h] 000000014000112F imul eax,eax,0FFFFFFFFh 0000000140001134 imul rax,qword ptr [rsp+30h] 000000014000113A mov qword ptr [rsp+40h],rax
000000014000111A mov qword ptr [rsp+30h],3
Move 3 into the stack location for bytespp.
0000000140001123 mov dword ptr [rsp+38h],27Dh
Move 0x27D into the stack location for width.
000000014000112B mov eax,dword ptr [rsp+38h]
Move the value of width into eax.
000000014000112F imul eax,eax,0FFFFFFFFh
Multiply eax by -1 and store the result into eax.
It’s important to note here that eax is comprised of the lower 32-bits of the 64-bit register rax. because the operand types are int and unsigned int the result is a signed int. This means that no sign extension takes place because no type promotion occurs. The result is stored into eax.
Any significant bits past the 32-bit boundary are lopped off. Thus, the 64-bit register rax looks like this; 0x00000000FFFFFD83.
0000000140001134 imul rax,qword ptr [rsp+30h]
Multiply rax by the value of bytespp. This is the important bit!
In this instance one of the operands is an unsigned 64 bit value (bytespp is of type size_t, which is an unsigned __int64). This means that the result is a 64-bit value, so the entire rax register is multiplied by the value of bytespp and the result stored into rax.
No sign extension takes place here due to the two operand types. You may expect that it would due to the fact that the first multiplication resulted in a signed int, but look closely at the disassembly. Our first operand is rax, a 64-bit register, and the second operand is bytespp, an unsigned __int64.
Due to the order of the three-operand multiplication we missed the boat for a sign extension. If only the order of the operands were swapped and the expression was:
__int64 incr = -1 * bytespp * width
it would have “just worked”. Because the first multiply instruction would have been -1 * bytespp a promotion to __int64 would have taken place, i.e., the result would have been sign extended and placed into the rax register. The next multiplication (rax * width) would have resulted in the correct result, but obviously you don’t want to rely on order of evaluation in an associative expression.
Anyhow, the multiplication results in this:
0x00000000FFFFFD83 * 3 = 0x2FFFFF889
Note that this is mathematically equivalent to the result of final multiplication in the 32-bit version, but in this case, no truncation occurs. As such, a value of 12884899977 is calculated as the pointer increment and, after swapping the first scan line of the image, we increment the pointer off into obscurity and (luckily!) crash with a segfault.
If the error weren’t so large the root cause may have taken longer to find. I love it when my code crashes early, I really do.
This bug is not the fault of the original developer. Ok, it is, but I happen to know this person and he is very good at what he does. Really, if I’m half the engineer that he is in twenty years I’ll be more than content.
The fact is that it is hard to write code that will just work perfectly when ported from X-bit platform to Y-bit platform. No amount of typedefs will save you. Seriously, you will screw up somewhere, and that’s ok; QA professionals need jobs too, and God knows we pretty much suck at testing our own code.
On a side note, this code was not made any easier to understand by one of the typedefs being used. Two of them were necessary; namely, ptrdiff_t and size_t. These types change depending on your target environment and, well, that’s what typedefs are for.
The last typedef was not necessary and only served to confuse things. Using a typedef for the width and height of an image in this case was uncalled for. The type that the typedef maps to never changes. Don’t use typedefs solely to give your types different names if you never intend on mapping them to a new type. Seriously, I don’t mind typing unsigned int, especially when it makes my code clearer and easier to maintain.
I subscribe to the Linus Torvalds view on typedefs. They are not meant to be paraded about like a Weimaraner aiming for best in show. They serve a purpose, don’t abuse them.