forked from hjl-tools/x86-psABI
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdl.tex
478 lines (417 loc) · 20.4 KB
/
dl.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
%%% vim:ai:tw=72:
\chapter{Program Loading and Dynamic Linking}
\section{Program Loading}
Program loading is a process of mapping file segments to virtual
memory segments. For efficient mapping executable and shared object
files must have segments whose file offsets and virtual addresses are
congruent modulo the page size.
To save space the file page holding the last page of the text segment
may also contain the first page of the data segment. The last data
page may contain file information not relevant to the running process.
Logically, the system enforces the memory permissions as if each
segment were complete and separate; segments' addresses are adjusted
to ensure each logical page in the address space has a single set of
permissions. In the example above, the region of the file holding the
end of text and the beginning of data will be mapped twice: at one
virtual address for text and at a different virtual address for data.
The end of the data segment requires special handling for
uninitialized data, which the system defines to begin with zero
values. Thus if a file's last data page includes information not in
the logical memory page, the extraneous data must be set to zero, not
the unknown contents of the executable file. ``Impurities'' in the
other three pages are not logically part of the process image; whether
the system expunges them is unspecified.
One aspect of segment loading differs between executable files and
shared objects. Executable file segments typically contain absolute
code (see section~\ref{sec_coding_examples} ``Coding Examples'').
For the process to
execute correctly, the segments must reside at the virtual addresses
used to build the executable file. Thus the system uses the {\tt p_vaddr}
values unchanged as virtual addresses.
On the other hand, shared object segments typically contain
position-independent code. This lets a segments virtual address
change from one process to another, without invalidating execution
behavior. Though the system chooses virtual addresses for individual
processes, it maintains the segments' relative positions. Because
position-independent code uses relative addressing between segments,
the difference between virtual addresses in memory must match the
difference between virtual addresses in the file.
%%SUN The following table
%%shows possible shared object virtual address assignments for several
%%processes, illustrating constant relative positioning. The table also
%%illustrates the base address computations.
\subsection{Program header}
The following \xARCH program header types are defined:
\begin{table}[H]
\Hrule
\caption{Program Header Types}
\begin{center}
\begin{tabular}[t]{l|l}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Value} \\
\hline
\texttt{PT_GNU_EH_FRAME} & \texttt{0x6474e550} \\
\texttt{PT_SUNW_EH_FRAME} & \texttt{0x6474e550} \\
\texttt{PT_SUNW_UNWIND} & \texttt{0x6464e550}
\end{tabular}
\end{center}
\Hrule
\end{table}
\begin{description}
\item[PT_GNU_EH_FRAME, PT_SUNW_EH_FRAME and PT_SUNW_UNWIND]
The segment contains the stack unwind tables.
See Section~\ref{sec_eh_frame} of this document.
\footnote{
The value for these program headers have been placed in the
{\tt PT_LOOS} and {\tt PT_HIOS} (os specific range) in order to adapt
to the existing GNU implementation. New OS's wanting to
agree on these program header should also add it into
their OS specific range.
}
\end{description}
\section{Dynamic Linking}
\subsubsection{Dynamic Section}
Dynamic section entries give information to the dynamic linker. Some
of this information is processor-specific, including the interpretation
of some entries in the dynamic structure.
\subsubsection{Global Offset Table (GOT)}
\label{got}
Position-independent code cannot, in general, contain absolute virtual
addresses. Global offset tables hold absolute addresses in private
data, thus making the addresses available without compromising the
position-independence and shareability of a program's text. A program
references its global offset table using position-independent
addressing and extracts absolute values, thus redirecting
position-independent references to absolute locations.
If a program requires direct access to the absolute address of a
symbol, that symbol will have a global offset table entry. Because
the executable file and shared objects have separate global offset
tables, a symbol's address may appear in several tables. The dynamic
linker processes all the global offset table relocations before giving
control to any code in the process image, thus ensuring the absolute
addresses are available during execution.
The tables first entry (number zero) is reserved to hold the address
of the dynamic
structure, referenced with the symbol \code{\_DYNAMIC}. This allows a
program, such as the dynamic linker, to find its own dynamic structure
without having yet processed its relocation entries. This is
especially important for the dynamic linker, because it must
initialize itself without relying on other programs to relocate its
memory image. On the \xARCH architecture, entries one and two in the
global offset table also are reserved.
The \textindex{global offset table} contains 64-bit addresses.
For the large models the GOT is allowed to be up to 16EB in size.
\begin{figure}[H]
\Hrule
\caption{Global Offset Table}
\begin{center}
\fbox{\code{extern Elf64_Addr _GLOBAL_OFFSET_TABLE_ [];}}
\end{center}
\Hrule
\end{figure}
The symbol \code{\_GLOBAL\_OFFSET\_TABLE\_} may reside in the
middle of the {\tt .got} section, allowing both negative and
non-negative offsets into the array of addresses.
\subsubsection{Function Addresses}
\label{function_addresses}
References to the address of a function from an executable
file and the shared objects associated with it might not resolve to
the same value. References from within shared objects will normally be
resolved by the dynamic linker to the virtual address of the function
itself. References from within the executable file to a function
defined in a shared object will normally be resolved by the link
editor to the address of the procedure linkage table entry for that
function within the executable file.
To allow comparisons of function addresses to work as expected, if an
executable file references a function defined in a shared object, the
link editor will place the address of the procedure linkage table
entry for that function in its associated symbol table entry. This
will result in symbol table entries with section index of {\tt SHN_UNDEF}
but a type of {\tt STT_FUNC} and a non-zero {\tt st_value}. A reference to the
address of a function from within a shared library will be satisfied
by such a definition in the executable.
Some relocations are associated with procedure linkage table
entries. These entries are used for direct function calls rather than
for references to function addresses. These relocations do not use the special
symbol value described above. Otherwise a very tight endless loop
would be created.
\subsubsection{Procedure Linkage Table}
\label{plt}
%This is ia32 enhanced to 64 bits without an explicit GOT
% register.
% This has been copied from the i386 ABI.
\index{procedure linkage table|(}
Much as the global offset table redirects position-independent address
calculations to absolute locations, the procedure linkage table
redirects position-independent function calls to absolute locations.
The link editor cannot resolve execution transfers (such as function
calls) from one executable or shared object to another. Consequently,
the link editor arranges to have the program transfer control to
entries in the procedure linkage table. On the \xARCH architecture,
procedure linkage tables reside in shared text, but they use addresses
in the private global offset table. The dynamic linker determines the
destinations' absolute addresses and modifies the global offset
table's memory image accordingly. The dynamic linker thus can
redirect the entries without compromising the position-independence
and shareability of the program's text. Executable files and shared
object files have separate procedure linkage tables. Unlike
\intelabi, this ABI uses the same procedure linkage table for both
programs and shared objects (see figure~\ref{small_med_plt}).
\begin{figure}[H]
\Hrule
\caption{Procedure Linkage Table (small and medium models)}
\label{small_med_plt}
\begin{footnotesize}
\begin{verbatim}
.PLT0: pushq GOT+8(%rip) # GOT[1]
jmp *GOT+16(%rip) # GOT[2]
nopl 0x0(%rax)
.PLT1: jmp *name1@GOTPCREL(%rip) # 16 bytes from .PLT0
pushq $index1
jmp .PLT0
.PLT2: jmp *name2@GOTPCREL(%rip) # 16 bytes from .PLT1
pushq $index2
jmp .PLT0
.PLT3: ...
\end{verbatim}%$
\end{footnotesize}
\Hrule
\end{figure}
Following the steps below, the dynamic linker and the program
``cooperate'' to resolve symbolic references through the procedure
linkage table and the global offset table.
\begin{enumerate}
\item When first creating the memory image of the program, the dynamic
linker sets the second and the third entries in the global offset
table to special values. Steps below explain more about these
values.
\item Each shared object file in the process image has its own
procedure linkage table, and control transfers to a procedure
linkage table entry only from within the same object file.
\item For illustration, assume the program calls \code{name1}, which
transfers control to the label \code{.PLT1}.
\item The first instruction jumps to the address in the global offset
table entry for \code{name1}. Initially the global offset table
holds the address of the following \code{pushq} instruction, not the
real address of \code{name1}.
\item Now the program pushes a relocation index (\textit{index}) on
the stack. The relocation index is a 32-bit, non-negative index into
the relocation table addressed by the \codeindex{DT_JMPREL} dynamic
section entry. The designated relocation entry will have type
\codeindexwo{R_X86_64_JUMP_SLOT}, and its offset will specify the
global offset table entry used in the previous \code{jmp}
instruction. The relocation entry contains a symbol table index
that will reference the appropriate symbol, \code{name1} in the
example.
% Note that index is sign-extended. Since the GOT will only have
% 2^29 entries (see chapter 4), this is no problem.
\item After pushing the relocation index, the program then jumps to
\code{.PLT0}, the first entry in the procedure linkage table. The
\code{pushq} instruction places the value of the second global
offset table entry (GOT+8) on the stack, thus giving the dynamic
linker one word of identifying information. The program then jumps
to the address in the third global offset table entry (GOT+16),
which transfers control to the dynamic linker.
\item When the dynamic linker receives control, it unwinds the stack,
looks at the designated relocation entry, finds the symbol's value,
stores the ``real'' address for \code{name1} in its global offset
table entry, and transfers control to the desired destination.
\item Subsequent executions of the procedure linkage table entry will
transfer directly to \code{name1}, without calling the dynamic
linker a second time. That is, the \code{jmp} instruction at
\code{.PLT1} will transfer to \code{name1}, instead of ``falling
through'' to the \code{pushq} instruction.
\end{enumerate}
The \code{LD_BIND_NOW} environment variable can change the dynamic
linking behavior. If its value is non-null, the dynamic linker
evaluates procedure linkage table entries before transferring control
to the program. That is, the dynamic linker processes relocation
entries of type \codeindex{R_X86_64_JUMP_SLOT}
during process initialization. Otherwise, the dynamic linker
evaluates procedure linkage table entries lazily, delaying symbol
resolution and relocation until the first execution of a table entry.
\index{procedure linkage table|)}
Relocation entries of type \codeindex{R_X86_64_TLSDESC} may also be
subject to lazy relocation, using a single entry in the procedure
linkage table and in the global offset table, at locations given by
\texttt{DT_TLSDESC_PLT} and \texttt{DT_TLSDESC_GOT}, respectively, as
described in ``Thread-Local Storage Descriptors for IA32 and
AMD64/EM64T''\footnote{This document is currently available via
\url{http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
For self-containment, \texttt{DT_TLSDESC_GOT} specifies a GOT entry in
which the dynamic loader should store the address of its internal TLS
Descriptor resolver function, whereas \texttt{DT_TLSDESC_PLT}
specifies the address of a PLT entry to be used as the TLS descriptor
resolver function for lazy resolution from within this module. The
PLT entry must push the linkmap of the module onto the stack and
tail-call the internal TLS Descriptor resolver function.
\subsubsection{Large Models}
In the small and medium code models the size of both the PLT and the GOT
is limited by the maximum 32-bit displacement size.
Consequently, the base of the PLT and the
top of the GOT can be at most 2GB apart.
Therefore, in order to support the available addressing space of 16EB,
it is necessary to extend both the PLT and the GOT. Moreover, the PLT
needs to support the GOT being over 2GB away and the GOT can be over
2GB in size.\footnote{If it is determined that the base of the PLT is
within 2GB of the top of the GOT, it is also allowed to use the same
PLT layout for a large code model object as that of the small and
medium code models.}
The PLT is extended as shown in figure~\ref{final_large_plt}
with the assumption that the GOT address is
in \code{\%r15}\footnote{See Function Prologue.}.
\begin{figure}[H]
\Hrule
\caption{Final Large Code Model PLT} \label{final_large_plt}
\begin{footnotesize}
\begin{verbatim}
.PLT0: pushq 8(%r15) # GOT[1]
jmpq *16(%r15) # GOT[2]
nopl 0x0(%rax,%rax,1)
.PLT1: movabs $name1@GOT,%r11 # 16 bytes from .PLT0
jmp *(%r11,%r15)
.PLT1a: pushq $index1 # "call" dynamic linker
jmp .PLT0
.PLT2: ... # 21 bytes from .PLT1
.PLTx: movabs $namex@GOT,%r11 # 102261125th entry
jmp *(%r11,%r15)
.PLTxa: pushq $indexx
pushq 8(%r15) # repeat .PLT0 code
jmpq *16(%r15)
.PLTy: ... # 27 bytes from .PLTx
\end{verbatim}%$
\end{footnotesize}
\end{figure}
This way, for the first 102261125 entries, each PLT entry besides
\code{.PLT0} uses only 21 bytes. Afterwards, the PLT entry code changes
by repeating that of .PLT0, when each PLT entry is 27
bytes long. Notice that any alignment consideration is dropped in order
to keep the PLT size down.
Each extended PLT entry is thus 5 to 11 bytes larger than the small and
medium code model PLT entries.
The functionality of entry .PLT0 remains unchanged from the small and
medium code models.
Note that the symbol index is still limited to 32 bits, which would
allow for up to 4G global and external functions.
Typically, UNIX compilers support two types of PLT, generally through
the options \code{-fpic} and \code{-fPIC}. When building position-independent
objects using the large code model, only \code{-fPIC} is allowed. Using the
option \code{-fpic} with the large code model remains reserved for future
use.
\subsection{Program Interpreter}
The valid \textindex{program interpreter} for programs conforming to the
\xARCH ABI is listed in Table \ref{interp}, which also contains the
\textindex{program interpreter} used by Linux.
\begin{figure}
\caption{\xARCH Program Interpreter}
\label{interp}
\begin{center}
\begin{tabular}[t]{l|l|l}
\multicolumn{1}{c}{Data Model} & \multicolumn{1}{c}{Path} &
\multicolumn{1}{c}{Linux Path} \\
\hline
LP64 & \path{/lib/ld64.so.1} & \path{/lib64/ld-linux-x86-64.so.2} \\
\hline
ILP32 & \path{/lib/ldx32.so.1} & \path{/libx32/ld-linux-x32.so.2} \\
\end{tabular}
\end{center}
\end{figure}
\subsection{Initialization and Termination Functions}
% copied from ia64
The implementation is responsible for executing the initialization
functions specified by \codeindexwo{DT_INIT}, \codeindexwo{DT_INIT_ARRAY},
and \codeindex{DT_PREINIT_ARRAY} entries in the executable file and
shared object files for a process, and the termination (or
finalization) functions specified by \codeindex{DT_FINI} and
\codeindexwo{DT_FINI_ARRAY}, as specified by the \textit{System V ABI}.
The user program plays no further part in executing the initialization
and termination functions specified by these dynamic tags.
\section {Program Property}
The following processor-specific program property types
\footnote{See {\bf Linux Extensions to gABI} at
\url{https://github.com/hjl-tools/linux-abi}} are defined:
\begin{table}[H]
\Hrule
\caption{Program Property Type}
\begin{center}
\begin{tabular}[t]{l|l}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Value} \\
\hline
\texttt{GNU_PROPERTY_X86_ISA_1_USED} & \texttt{0xc0000000} \\
\texttt{GNU_PROPERTY_X86_ISA_1_NEEDED} & \texttt{0xc0000001} \\
\texttt{GNU_PROPERTY_X86_FEATURE_1_AND} & \texttt{0xc0000002} \\
\end{tabular}
\end{center}
\Hrule
\end{table}
\begin{description}
\item[GNU_PROPERTY_X86_ISA_1_USED] Its \code{pr_data} field contains
a 4-byte integer. The x86 instruction sets indicated by the
corresponding bits are used in program. Their support in the
hardware is optional. A bit in the output \code{pr_data} field is
set if it is set in any relocatable input \code{pr_data} fields.
\item[GNU_PROPERTY_X86_ISA_1_NEEDED] Its \code{pr_data} field contains
a 4-byte integer. The x86 instruction sets indicated by the
corresponding bits are used in program and they must be supported
by the hardware. A bit in the output \code{pr_data} field is
set if it is set in any relocatable input \code{pr_data} fields.
\item[GNU_PROPERTY_X86_FEATURE_1_AND] Its \code{pr_data} field contains
a 4-byte integer. A bit in the output \code{pr_data} field is set
only if it is set in all relocatable input \code{pr_data} fields.
\end{description}
\begin{table}[H]
\Hrule
\caption{Bit Flags For X86 Instruction Sets}
\begin{center}
\begin{tabular}[t]{l|l}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Value} \\
\hline
\texttt{GNU_PROPERTY_X86_ISA_1_486} & \texttt{1U << 0} \\
\texttt{GNU_PROPERTY_X86_ISA_1_586} & \texttt{1U << 1} \\
\texttt{GNU_PROPERTY_X86_ISA_1_686} & \texttt{1U << 2} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSE} & \texttt{1U << 3} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSE2} & \texttt{1U << 4} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSE3} & \texttt{1U << 5} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSSE3} & \texttt{1U << 6} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSE4_1} & \texttt{1U << 7} \\
\texttt{GNU_PROPERTY_X86_ISA_1_SSE4_2} & \texttt{1U << 8} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX} & \texttt{1U << 9} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX2} & \texttt{1U << 10} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512F} & \texttt{1U << 11} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512CD} & \texttt{1U << 12} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512ER} & \texttt{1U << 13} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512PF} & \texttt{1U << 14} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512VL} & \texttt{1U << 15} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512DQ} & \texttt{1U << 16} \\
\texttt{GNU_PROPERTY_X86_ISA_1_AVX512BW} & \texttt{1U << 17} \\
\end{tabular}
\end{center}
\Hrule
\end{table}
The following bits are defined for \code{GNU_PROPERTY_X86_FEATURE_1_AND}:
\begin{table}[H]
\Hrule
\caption{GNU_PROPERTY_X86_FEATURE_1_AND Bit Flags}
\begin{center}
\begin{tabular}[t]{l|l}
\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Value} \\
\hline
\texttt{GNU_PROPERTY_X86_FEATURE_1_IBT} & \texttt{1U << 0} \\
\texttt{GNU_PROPERTY_X86_FEATURE_1_SHSTK} & \texttt{1U << 1} \\
\end{tabular}
\end{center}
\Hrule
\end{table}
\begin{description}
\item[GNU_PROPERTY_X86_FEATURE_1_IBT] This indicates that all executable
sections are compatible with IBT (see Section~\ref{ibt}) when
\code{endbr64} instruction starts each valid target where an indirect
branch instruction can land.
\item[GNU_PROPERTY_X86_FEATURE_1_SHSTK] This indicates that all
executable sections are compatible with SHSTK (see Section~\ref{shstk})
where return address popped from shadow stack always matches return
address popped from normal stack.
\end{description}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "abi"
%%% End: