|
Page 2 of 7 2. Assembly Language Code for a C Program The GNU C Compiler (called GCC henceforth) reads a C program source file and translates it into an object program that contains the machine code for the program in binary form. However, it can also produce assembly language source code for the program instead of the object code, so that we can read it and understand how the assembly language program looks. We will be generating such assembly language files to see the optimizations being used by the compiler, so it would be beneficial if we see how the assembly language code for a simple C program looks like. However, also note that we need not understand each assembly language statement in the generated code, so some statements that are not crucial towards understanding the code optimization will not be explained for simplicity. To generate the assembly language code, create a file test1.c as shown below and give the following command: $ gcc -c -S test1.c This will generate a file test1.s, which has the assembly language listing of the generated code for the C program. The file test1.c along with the assembly language code is shown below. 1 : /* test1.c */ 2 : /* This first file simply demonstrates how the assembly program that the 3 : compiler produces looks like and some peculiarities of the GNU 4 : assembler that follows some different conventions from MASM/TASM. 5 : */ 6 : #include <stdio.h> 7 : 8 : int main() 9 : { 10 : printf("Hello, World\n"); 11 : return 0; 12 : } 13 : 14 : /* end test1.c */ 15 : /* ----------------------------------------------------------------- */ 16 : /* generated assembly language file */ 17 : .file "test1.c" /* some assembler directives to be */ 18 : .version "01.01" /* ignored */ 19 : gcc2_compiled.: 20 : .section .rodata /* this segment has read-only data */ 21 : .LC0: 22 : .string "Hello, World\n" 23 : .text 24 : .align 4 25 : .globl main 26 : .type main,@function 27 : main: /* main function begins */ 28 : pushl $.LC0 /* push parameter for printf() on stack */ 29 : call printf /* call the function */ 30 : addl $4,%esp /* clear the stack */ 31 : 32 : xorl %eax,%eax /* make EAX = 0, functions use register */ 33 : jmp .L1 /* EAX to return values */ 34 : .p2align 4,,7 /* this is an alignment directive */ 35 : .L1: 36 : ret /* return from main, done */ 37 : .Lfe1: 38 : /* other assembler directives to be ignored */ 39 : .size main,.Lfe1-main 40 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 41 : /* end of the generated assembly language file */ 42 : /* ---------------------------------------------------------------- */
In case you know about the architecture of the 80386 or higher microprocessors and have experience with assembly language programming, you will find that the assembly language that is produced is only partly familiar to you. This assembly language follows the AT&T assembler syntax that is different from the Intel/Microsoft Assembler/Turbo Assembler syntax that most of us are probably familiar with. Let us go through the assembly language code. We will ignore the assembler directives as they are not crucial towards understanding the generated code. On line 20 a read-only data segment has been defined with the string "Hello, World\n" in it. A label .LC0 has been assigned to the string. On line 27, the code for the main() function begins. As we know, in C language the parameters to functions are passed by pushing them on the stack, and the parameters are pushed in the reverse order i.e. the last parameter is pushed first on the stack. In this case, the printf() function is being called with a single parameter, the string "Hello, World\n". The statement pushl $.LC0 pushes the address of the string "Hello, World\n" on the stack. The "l" in pushl stands for "long" as we are dealing with 32 bit variables. In all the assembly language programs that we will see, the mnemonics will be followed by an "l" to indicate that we are dealing with 32 bit variables. The "$" preceding .LC0 means the address of the string. The next statement calls the printf() function. After the printf() function finishes execution, we need to cleanup the stack, so we need to add 4 to ESP. (Why 4? That's because we pushed 4 bytes on the stack before calling printf().) To do so, we would normally write, ADD ESP, 4. The Intel convention is <instruction> dest, src. However, the AT&T convention is <instruction> src, dest, so you can see that the instruction on line 30 is addl $4, %esp. Immediate operands like 4 are prefixed by a $ and register names are prefixed by a %. This convention was followed to maintain compatibility with the BSD assembler. The next statement XOR's EAX with itself so that EAX = 0 afterwards. This is because the return value from a function is stored in the EAX register. After that, you will see a jump instruction and an alignment directive. These will not be explained and it will suffice to know that main() function will return back at this point. Now that we understand how the assembly language code looks like, let us move towards the optimizations.
|