D4D4¶
A co-worker of mine was looking at some disassembled ARM code the
other day, and discovered something weird. Lots of d4d4
instructions, scattered about. LLVM’s objdump says this is a relative
branch to -0x58
. The weird part is that they were always
unreachable.
Experiments¶
Here’s an example in a minimal reproducer I wrote:
00020100 <one>:
20100: 4770 bx lr
20102: d4d4 bmi 0x200ae <__dso_handle+0x100ae> @ imm = #-0x58
That bx lr
right before the d4d4
branches to the link
register. In other words, it returns. Here’s the C code that goes
with this function:
#include "mod.h"
static void one(void)
{
return;
}
int main(void)
{
void *fn;
fn = one;
use_ptr(fn);
return 0;
}
The use_ptr
function is declared in mod.h (defined in mod.c),
and what it does with the pointer is not important. You can see that
there’s a function called one
, and that function just
returns. Thus bx lr
being the only thing. But why is there an
extra d4d4
after it in the disassembled object code? My first
thought was that it was there for alignment. Of course, Thumb
instructions are 16 bits and maybe functions need to be 32-bit
aligned. Weird that it would use a branch to a real relative address
instead of a nop or something that would cause a fault, but let’s try
expanding the experiment.
code:
static void one(void)
{
return;
}
static void two(void)
{
return;
}
int main(void)
{
void *fn;
fn = one;
use_ptr(fn);
fn = two;
use_ptr(fn);
return 0;
}
And the disassembly:
000200f4 <main>:
200f4: b580 push {r7, lr}
200f6: 466f mov r7, sp
200f8: 4803 ldr r0, [pc, #0xc] @ 0x20108 <main+0x14>
200fa: f000 f80b bl 0x20114 <use_ptr> @ imm = #0x16
200fe: 4803 ldr r0, [pc, #0xc] @ 0x2010c <main+0x18>
20100: f000 f808 bl 0x20114 <use_ptr> @ imm = #0x10
20104: 2000 movs r0, #0x0
20106: bd80 pop {r7, pc}
20108: 11 01 02 00 .word 0x00020111
2010c: 13 01 02 00 .word 0x00020113
00020110 <one>:
20110: 4770 bx lr
00020112 <two>:
20112: 4770 bx lr
00020114 <use_ptr>:
20114: 4770 bx lr
Not only does the compiler not feel the need to align functions to
32-bit boundaries, but adding a second single-instruction function
actually got rid of the d4d4
entirely. And note how use_ptr
(I
made it just return also) doesn’t have a d4d4
in it. Curious.
What happens if I go to 3 functions?
code:
00020124 <one>:
20124: 4770 bx lr
00020126 <two>:
20126: 4770 bx lr
00020128 <three>:
20128: 4770 bx lr
2012a: d4d4 bmi 0x200d6 <__dso_handle+0x100d6> @ imm = #-0x58
0002012c <use_ptr>:
2012c: 4770 bx lr
It’s back! But now only once. It seems like it aligns the end of
main.o
so that mod.o
can start on a 32-bit boundary. Ok, maybe
this makes sense. So let’s take a look at the compiler’s output
directly. Here’s main.o
:
00000028 <one>:
28: 4770 bx lr
0000002a <two>:
2a: 4770 bx lr
0000002c <three>:
2c: 4770 bx lr
Oh, the compiler didn’t put that in at all, it must have been the
linker! So lld is rounding the end of an object file up to 32-bit
alignment using this d4d4
instruction. If that’s true, then
linking mod.o before main.o ought to move where the d4d4
is.
000200e4 <use_ptr>:
200e4: 4770 bx lr
200e6: d4d4 bmi 0x20092 <__dso_handle+0x10092> @ imm = #-0x58
000200e8 <main>:
200e8: b5d0 push {r4, r6, r7, lr}
200ea: af02 add r7, sp, #0x8
200ec: 4804 ldr r0, [pc, #0x10] @ 0x20100 <main+0x18>
200ee: 4c05 ldr r4, [pc, #0x14] @ 0x20104 <main+0x1c>
200f0: 47a0 blx r4
200f2: 4805 ldr r0, [pc, #0x14] @ 0x20108 <main+0x20>
200f4: 47a0 blx r4
200f6: 4805 ldr r0, [pc, #0x14] @ 0x2010c <main+0x24>
200f8: 47a0 blx r4
200fa: 2000 movs r0, #0x0
200fc: bdd0 pop {r4, r6, r7, pc}
200fe: bf00 nop
20100: 11 01 02 00 .word 0x00020111
20104: e5 00 02 00 .word 0x000200e5
20108: 13 01 02 00 .word 0x00020113
2010c: 15 01 02 00 .word 0x00020115
00020110 <one>:
20110: 4770 bx lr
00020112 <two>:
20112: 4770 bx lr
00020114 <three>:
20114: 4770 bx lr
It did! The extra instruction got moved up to align the beginning of the object code that contains main!
One more check: if I use the GNU linker, does it do the same thing?
00008000 <use_ptr>:
8000: 4770 bx lr
8002: 0000 movs r0, r0
No! GNU ld (2.44) inserts zeroes to align files. It’s not nop
,
though it may as well be. ARMv7-M actually has a nop
instruction:
bf00
or f3af8000
depending on encoding. You can see it in
main, inserted by the compiler between the end of the function and its
constants.
Conclusion¶
So now we know. LLD is inserting the weird d4d4
instructions, and
it’s doing it to align across object file boundaries. Why did they
pick such a weird constant, though? GNU ld went with zeroes, which
seems benign.
Research¶
A little bit of checking out the code later, and we find this in ARM.cpp:
trapInstr = {0xd4, 0xd4, 0xd4, 0xd4};
That was actually way easier than I was expecting. The git blame path meanders a little before getting to this commit where Rui Ueyama explains:
Add trap instructions for ARM and MIPS.
This patch fills holes in executable sections with 0xd4 (ARM) or
0xef (MIPS). These trap instructions were suggested by Theo de Raadt.
llvm-svn: 306322
This appears to have been precipitated by this message,
also from Rui Ueyama, to the llvm-bugs mailing list. In it, the
question is asked whether LLD should use a trap instruction for
ARM/AArch64 like x86 and x86-64’s 0xCC
. I didn’t find any replies,
though.
I couldn’t find any messages on the mailing list about this from Theo
de Raadt, so I guess we just have to live with Ueyama’s testimony that
he thought 0xd4
would be a good byte to repeat as a trap
instruction. But a trap instruction is supposed to halt the processor,
so what’s with the disassembler saying it’s a branch?
RTFM¶
Let’s take a look at the ARMv7-M Architecture Reference Manual, the
ARM. First
of all, it says that we’re using the Thumb instruction set, and most
instructions are 16 bits. Any instructions that begin with
0b11101
, 0b11110
, or 0b11111
are the beginnings of 32-bit
instructions, but all the rest are 16 bits. Since d4
is
0b11010100
, we can safely assume that the instruction decoder will
always treat it like a 16-bit instruction.
Next up, we have table A5-1, showing how 16-bit instructions are
encoded. The first 6 bits are the opcode, followed by 10 bits of other
stuff. As we established earlier, the bits we’re looking for are
0b110101
. That matches conditional branch and supervisor call’s
0b1101xx
. Well, so far it’s still looking like a branch..
Page A5-134 leads us to the statement that the encoding here is
0b1101
followed by a 4 bit opcode. 0xd
is the 0b1101
and
the next 4 bits are the number 4: 0b0100
. This doesn’t match
UDF
, which seems like a reasonable choice for this
purpose. Instead, any opcode not matching 111x
is a conditional
branch, explained under B
on page A7-207, according to table A5-8.
This page tells us that the B
instruction has several encodings,
but only one that begins with a 0xd
: T1.

With 0xd4d4
, cond
in this table is 0x4
or 0b0100
. That
doesn’t match UDF (again) or SVC
. InITBlock()
just checks if
we’re within 4 instructions of an IT
. IT
is a weird
instruction, but not important for our purposes. What’s important is
that imm32
is now those least significant 8 bits, sign
extended. That’s 0xffffffd4
, or -44
. That’s -0x2c
in hex,
not the -0x58
from objdump. However, immediate offsets count
half-words, not bytes. So the offset is -88
, or -0x58
.
The other field, cond
, is 0b0100
, which means some bits in the
condition registers have to be set in order for the branch to be
taken. Which bits don’t particularly matter, since this code is
supposed to be unreachable, but for completeness, it’s (ASPR.C ==
'1') && (ASPR.Z == '0')
.
More Conclusions¶
So it looks to me like objdump is behaving correctly, and Theo’s
suggestion was a bad one, at least for Thumb. Instead of a trap, it’s
a relative jump. In the tiny program I made above, it jumps right out
of .text
, but in a more substantial program, it will usually land
somewhere in code. It should be unreachable, but the whole point of a
trap instruction is to halt the processor, not send it off on a random
walk through your codebase.
This feels like a bug in LLD to me. The linker’s inserting something called “trapInstr” but it’s nothing like a trap. In fact, it’s a (conditional) jump to a constant relative address.
So after a fun night of spelunking through source code, commit logs, and manuals, I feel like I’ve learned something. I guess maybe it’s time to file a bug report.