The cornerstone of compiling and linking in Linux system-ELF file: take off its layers and explore from the granularity of bytecode

The cornerstone of compiling and linking in Linux system-ELF file: take off its layers and explore from the granularity of bytecode


first meet

Hello everyone, I am an ELF file, and the name is Executive and Linkable Format.

The friends who often develop in the Linux system are definitely familiar with me, especially those who need to understand the compilation and linking, it is estimated that I have thoroughly researched.

In order to get to know more friends, today is my open day. I will open my heart layer by layer like an onion and let more friends come to know me. Welcome everyone to come and watch.

In the past, when I saw some friends who were studying me, they looked at the summary information in the head, and then took a look at the layout of Sections, as they were familiar with me.

From a scientific perspective, this is far from enough, and it has not reached the end.

When you are faced with the detailed process of compiling and linking, you will still be confused.

Today, I will analyze myself from the granularity of the bytecode, unreserved, open and honest, knowing nothing, inexhaustible, loyal, loyal, devotion, and after death, and let everyone see officials have an eye-opener. Feast your eyes.

After you understand this knowledge, in the future, you will continue to learn the low-level process of compilation and linking, as well as the loading of an executable program from the hard disk to the memory, until the execution of the main function, and your mind will be very bright.

In other words, mastering the structure and content of ELF files is the basis for understanding compilation, linking, and program execution.

Don t you have a saying: sharpen a knife and chop wood!

Okay, let's get started now!


The file is simple, the complex is the person

As a file, it must follow a certain format, and I am no exception.

From a macro point of view, I can be disassembled into four parts:

If you don t understand these concepts in the picture, it s okay. I will explain them one by one below.

In a Linux system, an ELF file is mainly used to represent 3 types of files:

Since it can be used to represent three types of files, there must be a place in the file to distinguish these three types of situations.

As you may have guessed, there is a field in my header content to indicate: Is the current ELF file an executable file? Is it a target file? Or is it a shared library file?

In addition, since I can represent 3 types of files, it must be used on 3 different occasions, or different guys will operate me:

  1. Executable file: It is read from the hard disk by the loader in the operating system and loaded into the memory for execution;

  2. Object file: read by the linker to generate an executable file or shared library file;

  3. Shared library file: when dynamically linking, it is read by ld-linux.so;

Take the linker and the loader for example. The two guys have different personalities, and they look at me differently.

When the linker looked at me, it had only 3 parts in its eyes:

In other words, the linker only cares about the three parts of ELF header, Sections and Section header table.

When the loader looks at me, there are three other parts in its eyes:

The loader only cares about the three parts of ELF header, Program header table and Segment.

By the way, from the perspective of the loader, for the middle part

Sections
, It changed its name to
Segments
(segment). Changing the soup without changing the medicine is essentially the same.

It can be understood as: A Segment may contain one or more Sections, as shown below:

This is like the commodities placed on the shelves in the supermarket: mineral water, cola, beer, chocolate, beef jerky, potato chips.

From the point of view of the tally: they belong to 6 different kinds of commodities; from the point of view of the supermarket manager, they belong to only 2 kinds of commodities: beverages and snacks.

how about it? Now you have an overall impression of me, right?

In fact, as long as you master

2
Just click the content:

  1. An ELF file consists of 4 parts in total;

  2. Linker and loader, when they use me, they will only use the part they are interested in;

I almost forgot to remind you: in

Linux
In the system, there will be different data structures to describe each part of the content mentioned above.

I know that some friends are more impatient, let me tell you these structures first.

When you meet for the first time, get to know it first, don't go into it.

Describe the structure of the ELF header:

Describe the structure of the Program header table:

Describe the structure of the Section header table:


ELF header

The content of the header is equivalent to a manager, which determines all the information inside the complete ELF file, such as:

  1. This is an ELF file;

  2. Some basic information: version, file type, machine type;

  3. The starting address of the Program header table (program header table), where in the entire file;

  4. The starting address of the Section header table (section header table), where in the entire file;

Are you a bit puzzled, it seems that Sections (from the linker's point of view) or Segments (from the loader's point of view) are not in the ELF file.

For the convenience of description, I will put

Sections
with
Segments
All are collectively called Sections!

In fact, it is like this. In an ELF file, there are many Sections. The specific information of these Sections is in

Program header table
or
Section head table
Described in.

Take

Section head table
Let s give an example:

If there are altogether in an ELF file

4
Section:
.text, .rodata, .data, .bss
, Then in
Section head table
In, there will be
4
Entry (entry) to describe the specific information of the 4 Sections (strictly speaking, more than 4 Entry, because there are some other auxiliary Sections), like the following:

I said at the beginning, I want to use the granularity of bytecode, and show it to you!

In order not to be rogue, I still use a specific code example to describe, only in this way, you can see the real bytecode.

The function of the program is relatively simple:

//mymath.c int my_add(int a, int b) { return a + b; } Copy code
//main.c #include <stdio.h> extern int my_add(int a, int b); int main() { int i = 1; int j = 2; int k = my_add(i, j); printf("k = %d/n", k); } Copy code

From the description just now, we can know: dynamic library file

libmymath.so
, target document
main.o
And executable
main
, They are all ELF files, but they belong to different types.

Here is the executable file main to disassemble it!

We first use the command

readelf -h main
Take a look at the main file,
ELF header
Information.

This tool, readelf, is a good thing! Be sure to make good use of it.

The information shown in this picture is

ELF header
All the content described in it. This content and structure
Elf32_Ehdr
The member variables in are one-to-one correspondence!

Did you find the content shown in line 15 of the figure:

Size of this header: 52 (bytes)
.

In other words:

ELF header
Part of the content, a total of
52
Bytes. Then I will put this at the beginning
52
Let me show you a bytecode.

This time, I use

od -Ax -t x1 -N 52 main
This command reads the bytecode in main and briefly explains a few of the options:

-Ax: When displaying the address, it is expressed in hexadecimal. If you use -Ad, it means to display the address in decimal;

-t -x1: When displaying bytecode content, use hexadecimal (x) and display one byte at a time (1);

-N 52: Only need to read 52 bytes;

This

52
The contents of the bytes, you can compare each field in the above structure to explain.

First look at the first 16 bytes.

The first member in the structure is

unsigned char e_ident[EI_NIDENT];
,
EI_NIDENT
The length is
16
,represents
EL header
In the beginning
16
Bytes, the specific meaning is as follows:

0-15 bytes

How is it? I thoroughly exposed myself in this way and confessed to you, is it enough to show my sincerity? !

If you are moved, don t forget to click on Watching and Collection at the bottom of the article. Thank you for forwarding it to your friends. Gifts of roses, hand left lingering fragrance!

For the sake of authority, I will post the explanation of this part in the official document for everyone to see:

Regarding big-endian and little-endian formats, this

main
What is shown in the file is
1
, Stands for little-endian format. What do you mean, look at the picture below to understand:

Then look at the big-endian format again:

Okay, let's continue to put the rest

36
Bytes (52-16 = 32), also drawn in this bytecode meaning:

16-31 bytes:

32-47 bytes:

48-51 bytes:

There is no need to explain the specific content anymore. Everything is deep in the feelings and bored. Not much to say, it is all in the wine~~ Oh no, the focus is on the picture!


String table entry Entry

in a

ELF
There are many strings in the file, such as variable names, section names, symbols added by the linker, etc. The length of these strings is not fixed. Therefore, a fixed structure is used to represent these strings. realistic.

So, smart humans thought: to gather these strings together and put them together as an independent

Section
To manage.

In other places in the file, if you want to represent a string, write a numeric index in this place: it means that the string is located at an offset position of the unified storage place of the string. After such a search, you can find The specific string.

For example, all strings are stored in the following space:

In other places in the program, if you want to quote the string "hello, world!", then you only need to mark the number in that place

13
That's it, it means: this string starts at an offset of 13 bytes.

So now, let's go back to this

main
The string table in the file,

in

ELF header
The last 2 bytes are
0x1C 0x00
, Which corresponds to the members in the structure
e_shstrndx
, Which means that in this ELF file, the string table is an ordinary Section. In this Section, it stores
ELF
All the strings used in the file.

Since it is a

Section
, Then in
Section header table
, There must be an entry to describe it, so which entry is it?

This is

0x1C 0x00
This entry, which is the first
28
Entries.

Here, we can also use the command

readelf -S main
Take a look at this
ELF
All in the file
Section
information:

Of which

28
A Section describes the string table Section:

It can be seen: this

Section
in
ELF
The offset address in the file is
0x0016ed
, The length is
0x00010a
Bytes.

Below, we start from

ELF header
To infer this information from the binary data.


Read the contents of the string table Section

Then I will demonstrate: how to pass

ELF header
The information provided in the string table this
Section
Find it out, and then print out its bytecode and show it to the judges.

To print the string table

Section
Content, you must know this
Section
in
ELF
The offset address in the file.

If you want to know the offset address, you can only get from

Section head table
B
28
Obtained from the description information of each entry.

Want to know the

28
The address of an entry must be known
Section head table
in
ELF
The starting address in the file, and the size of each entry.

Just at the end

2
Demand information, in
ELF header
China has told us, so if we calculate backwards, we will be able to succeed.

ELF header
In the
32
To
35
The byte content is:
F8 17 00 00
(Note the byte order here, low order first), which means
Section head table
The start address in the ELF file (
e_shoff
).

0x000017F8 = 6136
, That is to say
Section head table
Starts at
ELF
File
6136
Bytes.

Know the starting address, let's count the first

28
The address of an entry.

ELF header
In the
46, 47
The byte content is:
28 00
, Which means that the length of each entry is
0x0028 = 40
Bytes.

Note that the calculations here are all from

0
Started, so the first
28
The starting address of an entry is:
6136 + 28 * 40 = 7256
, That is to say used to describe the string table
Section
Table entry at
ELF
File
7256
The position of the byte.

Now that you know the address of this entry, let's take a look at the binary content:

Execution instructions:

od -Ad -t x1 -j 7256 -N 40 main
.

one of them

-j 7256
Option, which means skip the previous
7256
Bytes, that is, we start from
main
This one
ELF
File
7256
Start reading at the byte, read in total
40
Bytes.

This

40
The content of the bytes corresponds to
Elf32_Shdr
Each member variable in the structure:

Here we mainly focus on the ones marked in the figure above

4
Fields:

sh_name: I won't tell you for the time being, I will explain it right away;

sh_type: indicates the type of this Section, 3 indicates that this is a string table;

sh_offset: Indicates the offset of this Section in the ELF file. 0x000016ed = 5869, which means that the content of this Section of the string table starts at 5869 bytes of the ELF file;

sh_size: Indicates the length of this Section. 0x0000010a = 266 bytes, which means the content of this section of the string table, a total of 266 bytes.

Remember that we just used

readelf
Tool, read the string table
Section
The offset address in the ELF file is
0x0016ed
, The length is
0x00010a
Bytes?

It is completely consistent with our inference here!

Now that I know the string table

Section
in
ELF
The offset and length in the file, then the bytecode content can be read out.

Execution instructions:

od -Ad -tc -j 5869 -N 266 main
, All these parameters should no longer be explained, right? !

Take a look, take a look, is it this

Section
Are all the strings stored in?

No explanation just now

sh_name
This field, it represents the string table
Section
The name of itself, since it is a name, it must be a string.

But this string is not directly stored here, but an index is stored, the index value is

0x00000011
, Which is a decimal value
17
.

Now let s count the string table

Section
In the content, the first
17
What is stored at the beginning of a byte?

Don't be lazy, count it, do you see the string ".shstrtab" (\0 is the separator of the string)? !

Well, if you see this, you can understand everything, then the content of this part of the string table shows that you have fully understood it, and I will give you a hundred likes! ! !


Read the content of the code snippet

From the picture below (instruction:

readelf -S main
):

You can see that the code snippet is in the

14
Among the entries, the load (virtual) address is
0x08048470
,It is located
ELF
The offset in the file is
0x000470
, The length is
0x0001b2
Bytes.

Then let's try to read the content.

First calculate this entry

Entry
the address of:
6136 + 14 * 40 = 6696
.

Then read this entry

Entry
, The read instruction is
od -Ad -t x1 -j 6696 -N 40 main
:

Similarly, we only care about the following

5
Field content:

sh_name: This time it should be clear, it means the offset position of the name of the code segment in the string table Section. 0x9B = 155 bytes, that is, at the 155th byte of the string table Section, the name of the code segment is stored. Go back and look for it to see if it is the string ".text";

sh_type: indicates the type of this Section, 1(SHT_PROGBITS) indicates that this is code;

sh_addr: indicates that the virtual address loaded by this Section is 0x08048470, which is the same as the value of the e_entry field in the ELF header;

sh_offset: Indicates the offset of this Section in the ELF file. 0x00000470 = 1136, which means that the content of this section starts at 1136 bytes of the ELF file;

sh_size: Indicates the length of this Section. 0x000001b2 = 434 bytes, which means that the code segment has a total of 434 bytes.

The above analysis structure, and the instruction

readelf -S main
It reads exactly the same!

PS: Looking at the string table

Section
In the string, don t tell me you are really from
0
Start counting to
155
what! It can be calculated: the starting address of the string table is
5869
(Decimal), plus
155
,The result is
6024
, So from
6024
The beginning is the name of the code segment, which is ".text".

Knowing the above information, we can read the bytecode of the code segment. Use instructions:

od -Ad -t x1 -j 1136 -N 434 main
That's it.

The content is all black bytecode, so I won't post it.


Program header

At the beginning of the article, I introduced: I am a common file structure, the linker and the loader look at me differently.

In order to

Program header
For a more perceptual understanding, I still use it first
readelf
This tool to take a look at the overall
main
All segment information in the file.

Execution instructions:

readelf -l main
, Get the following picture:

The information displayed is already very clear:

  1. This is an executable program;

  2. The entry address is 0x8048470;

  3. There are a total of 9 Program headers, starting from the 52 offset addresses of the ELF file;

The layout is shown in the figure below:

I also told you at the beginning:

Section
versus
Segment
Essentially the same, it can be understood as: a Secgment is composed of one or more Sections.

As you can see from the figure above, the first

2
A
program header
This paragraph consists of so many
Section
Composition, do you understand better now? !

It can also be seen from the figure that there are a total of

2
A
LOAD
Type of segment:

Let's read the first LOAD type segment, and of course the binary bytecode in it.

The first step is to calculate the address information of this segment table entry.

From

ELF header
I learned the following information:

  1. Field

    e_phoff
    : Program header table is located at 52 bytes offset of ELF file.

  2. Field

    e_phentsize
    : The length of each entry is 32 bytes;

  3. Field

    e_phnum
    : There are a total of 9 entries Entry;

Through calculation, get readable and executable

LOAD
Segment at offset
116
Byte at.

Execute the read command:

od -Ad -t x1 -j 116 -N 32 main
:

According to the above convention, I still associate some of the fields that need attention with the member variables in the data structure:

p_type: the type of the segment, 1: indicates that this segment needs to be loaded into the memory;

p_offset: The offset address of the segment in the ELF file, where the value is 0, which means that this segment starts from the head of the ELF file;

p_vaddr: the virtual address 0x08048000 where the segment is loaded into the memory;

p_paddr: The physical address of the segment loading, which is the same as the virtual address;

p_filesz: The number of bytes occupied by this segment in the ELF file, 0x0744 = 1860 bytes;

p_memsz: The number of bytes that this segment needs to be loaded into the memory, 0x0744 = 1860 bytes. Note: some segments do not need to be loaded into memory;

After the above analysis, we know: from

ELF
File
1
To the first
1860
Bytes belong to this
LOAD
The content of the paragraph.

When executed, this segment needs to be loaded into memory at the virtual address

0x08048000
This place, from here, is a whole new story.


Review again

At this point, like an onion, I have peeled off all my coats so that you can see the finest granularity. Now, do you know enough about me?

In fact, just grab the bottom

2
Key points can be:

  1. ELF header describes the overall information of the file, as well as the related information of the two tables (offset address, number of entries, length of entries);

  2. Each table contains many entries, and each entry describes the specific information of a Section/Segment.

The linker and loader also parse ELF files according to this principle. After understanding these principles, you will not get lost when you learn the specific linking and loading process later!


------ End ------

Let the knowledge flow, the more you share, the luckier you are!


Hi~, I am Brother Dao, a veteran of embedded development.
Xingbiao public-public-number, can find me faster!


Recommended reading

[1] C language pointer-from the underlying principles to fancy skills, with pictures and codes to help you explain thoroughly
[2] Step by step analysis-how to use C to implement object-oriented programming
[3] The original gdb basic debugging principle is so simple
[ 4] Is inline assembly terrible? After reading this article, end it!
[5] It is said that the software architecture should be layered and divided into modules, what should be done specifically