This article describes some of the code optimization techniques used by the GNU C Compiler, in order to give the reader a feel of what code optimization is and how it can increase the efficiency of the generated object code.
As we all know, a compiler is a program that reads the source program in a high-level language and translates it into (typically) machine language. This is a complicated process involving a number of stages. If the compiler is an optimizing compiler, one of these stages "optimizes" the machine language code so that it either takes less time to run or occupies less memory or sometimes both. Of course, whatever optimizations the compiler does, it must not affect the logic of the program i.e. the optimization must preserve the meaning of the program. One might wonder what type of optimizations the compiler uses to produce efficient machine code? Since in no case the meaning of the program being compiled should be changed, the compiler must inspect the program very thoroughly and find out the suitable optimizations that can be applied. As we may wonder, such a thorough analysis of the program and then finding and applying suitable optimizations is a complex and time consuming process, the details of which are beyond the scope of this article.
What I am going to describe in this article are a few type of code optimizations that the GNU C Compiler uses so that we can understand how the code is optimized and appreciate the complexity of the optimization process. The GNU C Compiler is a sophisticated optimizing C compiler that uses a large array of optimization techniques to produce efficient code. It may not be possible to describe all of them, so I have chosen some that are interesting and also easy to understand. A complete list of the optimization techniques that are used by the GNU C compiler is available at http://www.redhat.com/products/support/gnupro/gnupro_gcc.html. Instead of simply describing what a particular optimization technique does, I will describe them using the assembly language code that is generated by the GNU C Compiler. This method will also enable the readers to further explore code optimization carried out by the compiler and also the advanced optimization techniques, in case they are interested.
The GNU C Compiler (called GCC henceforth) reads a C program source file and translates it into an object program that contains the machine code for the program in binary form. However, it can also produce assembly language source code for the program instead of the object code, so that we can read it and understand how the assembly language program looks. We will be generating such assembly language files to see the optimizations being used by the compiler, so it would be beneficial if we see how the assembly language code for a simple C program looks like. However, also note that we need not understand each assembly language statement in the generated code, so some statements that are not crucial towards understanding the code optimization will not be explained for simplicity. To generate the assembly language code, create a file test1.c as shown below and give the following command:
$ gcc -c -S test1.c
This will generate a file test1.s, which has the assembly language listing of the generated code for the C program. The file test1.c along with the assembly language code is shown below.
1 : /* test1.c */ 2 : /* This first file simply demonstrates how the assembly program that the 3 : compiler produces looks like and some peculiarities of the GNU 4 : assembler that follows some different conventions from MASM/TASM. 5 : */ 6 : #include <stdio.h> 7 : 8 : int main() 9 : { 10 : printf("Hello, World\n"); 11 : return 0; 12 : } 13 : 14 : /* end test1.c */ 15 : /* ----------------------------------------------------------------- */ 16 : /* generated assembly language file */ 17 : .file "test1.c" /* some assembler directives to be */ 18 : .version "01.01" /* ignored */ 19 : gcc2_compiled.: 20 : .section .rodata /* this segment has read-only data */ 21 : .LC0: 22 : .string "Hello, World\n" 23 : .text 24 : .align 4 25 : .globl main 26 : .type main,@function 27 : main: /* main function begins */ 28 : pushl $.LC0 /* push parameter for printf() on stack */ 29 : call printf /* call the function */ 30 : addl $4,%esp /* clear the stack */ 31 : 32 : xorl %eax,%eax /* make EAX = 0, functions use register */ 33 : jmp .L1 /* EAX to return values */ 34 : .p2align 4,,7 /* this is an alignment directive */ 35 : .L1: 36 : ret /* return from main, done */ 37 : .Lfe1: 38 : /* other assembler directives to be ignored */ 39 : .size main,.Lfe1-main 40 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 41 : /* end of the generated assembly language file */ 42 : /* ---------------------------------------------------------------- */
In case you know about the architecture of the 80386 or higher microprocessors and have experience with assembly language programming, you will find that the assembly language that is produced is only partly familiar to you. This assembly language follows the AT&T assembler syntax that is different from the Intel/Microsoft Assembler/Turbo Assembler syntax that most of us are probably familiar with. Let us go through the assembly language code. We will ignore the assembler directives as they are not crucial towards understanding the generated code. On line 20 a read-only data segment has been defined with the string "Hello, World\n" in it. A label .LC0 has been assigned to the string. On line 27, the code for the main() function begins. As we know, in C language the parameters to functions are passed by pushing them on the stack, and the parameters are pushed in the reverse order i.e. the last parameter is pushed first on the stack. In this case, the printf() function is being called with a single parameter, the string "Hello, World\n". The statement pushl $.LC0 pushes the address of the string "Hello, World\n" on the stack. The "l" in pushl stands for "long" as we are dealing with 32 bit variables. In all the assembly language programs that we will see, the mnemonics will be followed by an "l" to indicate that we are dealing with 32 bit variables. The "$" preceding .LC0 means the address of the string. The next statement calls the printf() function. After the printf() function finishes execution, we need to cleanup the stack, so we need to add 4 to ESP. (Why 4? That's because we pushed 4 bytes on the stack before calling printf().) To do so, we would normally write, ADD ESP, 4. The Intel convention is <instruction> dest, src. However, the AT&T convention is <instruction> src, dest, so you can see that the instruction on line 30 is addl $4, %esp. Immediate operands like 4 are prefixed by a $ and register names are prefixed by a %. This convention was followed to maintain compatibility with the BSD assembler. The next statement XOR's EAX with itself so that EAX = 0 afterwards. This is because the return value from a function is stored in the EAX register. After that, you will see a jump instruction and an alignment directive. These will not be explained and it will suffice to know that main() function will return back at this point. Now that we understand how the assembly language code looks like, let us move towards the optimizations.
Constant folding is the simplest code optimization to understand. Let us suppose that you write the statement x = 45 * 88; in your C program. A non-optimizing compiler will generate code to multiply 45 by 88 and store the value in x. An optimizing compiler will detect that both 45 and 88 are constants, so their product will also be a constant. Hence it will find 45 * 88 = 3960 and generate code that simply copies 3960 into x. This is constant folding, and means the calculation is done just once, at compile time, rather than every time the program is run. To illustrate this, create the file test2.c as shown below. Generate the assembly code for this program as was shown earlier. Since by default, GCC does not optimize the program, you will see that the assembly code is a straightforward translation of the C program (Lines 18 to 62).
1 : /* test2.c */ 2 : /* Demonstration of constant propagation */ 3 : #include <stdio.h> 4 : 5 : int main() 6 : { 7 : int x, y, z; 8 : x = 10; 9 : y = x + 45; 10 : z = y + 4; 11 : printf("The value of z = %d", z); 12 : return 0; 13 : } 14 : /* end of test2.c */ 15 : 16 : /* ---------------------------------------------------------------- */ 17 : /* assembly language file without any optimizations */ 18 : .file "test2.c" 19 : .version "01.01" 20 : gcc2_compiled.: 21 : .section .rodata 22 : .LC0: 23 : .string "The value of z = %d" 24 : .text 25 : .align 4 26 : .globl main 27 : .type main,@function 28 : main: 29 : pushl %ebp /* save EBP register on stack */ 30 : movl %esp,%ebp /* EBP = ESP */ 31 : subl $12,%esp /* Create stack frame. 3 variables x 4 bytes */ 32 : 33 : /* x = 10; */ 34 : movl $10,-4(%ebp) /* x = 10. x is topmost on the stack */ 35 : 36 : /* y = x + 45; */ 37 : movl -4(%ebp),%edx /* EDX = x */ 38 : addl $45,%edx /* EDX = EDX + 45 */ 39 : movl %edx,-8(%ebp) /* y = EDX. y is second from top of stack */ 40 : 41 : /* z = y + 4 */ 42 : movl -8(%ebp),%edx /* EDX = y */ 43 : addl $4,%edx /* EDX = EDX + 4 */ 44 : movl %edx,-12(%ebp) /* z = EDX. z is third from top of stack */ 45 : 46 : /* printf("The value of z = ", z); */ 47 : movl -12(%ebp),%eax /* EAX = z */ 48 : pushl %eax /* push EAX(=z) as first parameter of printf */ 49 : pushl $.LC0 /* second parameter for printf */ 50 : call printf 51 : addl $8,%esp /* clear stack */ 52 : 53 : /* return 0; */ 54 : xorl %eax,%eax /* for return 0 */ 55 : jmp .L1 56 : .p2align 4,,7 57 : .L1: 58 : leave 59 : ret 60 : .Lfe1: 61 : .size main,.Lfe1-main 62 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 63 : /* end of assembly language code */ 64 : /* ---------------------------------------------------------------- */ 65 : 66 : /* ---------------------------------------------------------------- */ 67 : /* generated assembly language code after optimization */ 68 : 69 : .file "test2.c" 70 : .version "01.01" 71 : gcc2_compiled.: 72 : .section .rodata 73 : .LC0: 74 : .string "The value of z = %d" 75 : .text 76 : .align 4 77 : .globl main 78 : .type main,@function 79 : main: 80 : pushl %ebp /* Save EBP register on stack */ 81 : movl %esp,%ebp /* EBP = ESP */ 82 : 83 : /* by constant propagation, z will always be 59 */ 84 : /* printf("The value of z = %d", z); */ 85 : pushl $59 /* first printf parameter */ 86 : pushl $.LC0 /* second printf parameter */ 87 : call printf 88 : /* no need of cleanup, we are exiting anyway */ 89 : /* return 0; */ 90 : xorl %eax,%eax 91 : leave 92 : ret 93 : .Lfe1: 94 : .size main,.Lfe1-main 95 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 96 : /* end of assembly language code */ 97 : /* ----------------------------------------------------------------- */
There are some new things in the assembly code that we need to understand. First of all, the main() function defines 3 local variables for whom space will be allocated on the stack. Also, to access these variables, we will be using the indexed addressing mode and we will be using EBP for that. So the first statement saves the current value of EBP onto the stack. Then the stack pointer ESP is copied into EBP (note that the source ESP is first) so that EBP can be used for accessing the local variables. Finally, we need to create space for three 4-byte variables on the stack, so we subtract 12 from ESP; i.e., we grow the stack by 12 bytes. x will be the bottom-most stack element and at an offset of -4 from EBP. See the diagram below.
EBP-----> --------------- | x | 4 Bytes --------------- | y | 4 Bytes --------------- | z | 4 Bytes ESP-----> --------------- | | | | ^ | | | ... Address increases | | | as we go up | 0 ---------------
Similarly, the offset of y from EBP is -8 and that of z is -12. Now consider the assignment x = 10;. The statement to implement it on line 34 is movl $10, -4(%ebp). The indexed addressing mode is written as <offset>(<base register>). Thus the above statement will copy the constant 10 to the address (EBP - 4) i.e. the address of x on the stack. In a similar manner, -8(%ebp) is used to access y and -12(%ebp) is used to access z. Lines 37, 38 and 39 evaluate x + 45 and assign the result to y. To do so, the value of x is first copied into the EDX register, 45 is added to it and the result which is in EDX is copied to y. The code for z = y + 4 is similar. In line 47 the parameters for printf() are pushed on the stack. The last parameter (z) is pushed first and then the address of the string is pushed. In line 51, we cleanup the stack by adding 8 to ESP. I guess that since we pushed two 4-byte parameters, we had to add 8 to ESP. In the first example, we pushed only one parameter and hence we had to add just 4 to ESP. The rest of the code is easy to understand.
Now if we look at the above program, when we evaluate x + 45, the value of x is always going to be 10 because of the assignment x = 10. Therefore x + 45 will always evaluate to 55. So the statement always assigns 55 to y. Similarly, when evaluating y + 4, the result will always be 59. If the optimization is enabled, the compiler will be in a position to detect this. To enable optimization, use the -O2 option. Enter the following command:
gcc -c -S -O2 test2.c
The assembly code generated with optimization enabled is shown in lines 69 to 95. Notice that on line 85, the compiler directly pushes the constant 59 on the stack followed by the address of the string and calls printf(). Thus, by constant folding, the compiler evaluates the various expressions in the program only once and plugs the final value into the generated code. One more interesting thing to observe is that after constant folding, there is no need of the variables x, y and z. Therefore no space for them is allocated on the stack, thus reducing the memory requirement of the program. This brings out the fact that one optimization may lead to another one. In the above case constant folding lead to a decrease in run time (since all the computations are done at compile time) and also to a decrease in the space requirements.
Many a times, it happens that the same expression is evaluated at different places in the same program and the values of the operands in the expression do not change in between the two evaluations of that expression. For example, the program may evaluate say a * b at the beginning and at the end. If the values of a and b do not change in between these two evaluations of a * b, then instead of evaluating a * b again at the end, we can save the result of the evaluation of a * b at the beginning in some temporary variable and use it at the end. This will help eliminate redundant computations in the program. This optimization is called as common subexpression elimination. Consider the following program that demonstrates it.
1 : /* test3.c */ 2 : /* common subexpression elimination, and also of constant propagation */ 3 : #include <stdio.h> 4 : 5 : int main() 6 : { 7 : int a, b; 8 : int x, y, z; 9 : scanf("%d %d", &a, &b); 10 : x = a * b; 11 : 12 : if(b >= 4) 13 : { 14 : y = a * b; 15 : z = 0; 16 : } 17 : else 18 : { 19 : z = a * b * 4; 20 : y = 0; 21 : } 22 : 23 : printf("x = %d, y = %d, z = %d\n", x, y, z); 24 : return 0; 25 : } 26 : /* end of test3.c */ 27 : 28 : /* ----------------------------------------------------------------- */ 29 : /* generated unoptimized assembly language code */ 30 : .file "test3.c" 31 : .version "01.01" 32 : gcc2_compiled.: 33 : .section .rodata 34 : .LC0: 35 : .string "%d %d" 36 : .LC1: 37 : .string "x = %d, y = %d, z = %d\n" 38 : .text 39 : .align 4 40 : .globl main 41 : .type main,@function 42 : main: 43 : pushl %ebp /* save EBP */ 44 : movl %esp,%ebp /* EBP = ESP */ 45 : subl $20,%esp /* Create space for 5 variables */ 46 : 47 : /* scanf("%d %d". &a, &b); */ 48 : leal -8(%ebp),%eax 49 : pushl %eax /* push address of b on stack */ 50 : leal -4(%ebp),%eax 51 : pushl %eax /* push address of a on stack */ 52 : pushl $.LC0 /* push address of string */ 53 : call scanf 54 : addl $12,%esp /* stack cleanup after scanf */ 55 : 56 : /* x = a * b; */ 57 : movl -4(%ebp),%eax /* EAX = a */ 58 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */ 59 : movl %eax,-12(%ebp) /* x = EAX = a * b */ 60 : 61 : /* if( b >= 4)... */ 62 : cmpl $3,-8(%ebp) /* compare b with 3 */ 63 : jle .L2 /* else part at label .L2, if follows */ 64 : 65 : /* y = a * b; */ 66 : movl -4(%ebp),%eax /* EAX = a */ 67 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */ 68 : movl %eax,-16(%ebp) /* y = EAX = a * b */ 69 : /* z = 0; */ 70 : movl $0,-20(%ebp) 71 : jmp .L3 /* jump over the else part */ 72 : 73 : .p2align 4,,7 74 : .L2: 75 : /* else part begins here */ 76 : 77 : /* z = a * b * 4; */ 78 : movl -4(%ebp),%eax /* EAX = a */ 79 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */ 80 : leal 0(,%eax,4),%edx /* EDX = EAX*4 + 0 */ 81 : movl %edx,-20(%ebp) /* z = EDX */ 82 : /* y = 0; */ 83 : movl $0,-16(%ebp) 84 : .L3: 85 : /* if..else is over here */ 86 : 87 : /* printf("x = %d, y = %d, z = %d\n", x, y, x); */ 88 : movl -20(%ebp),%eax 89 : pushl %eax /* push address of z on stack */ 90 : movl -16(%ebp),%eax 91 : pushl %eax /* push address of y on stack */ 92 : movl -12(%ebp),%eax 93 : pushl %eax /* push address of x on stack */ 94 : pushl $.LC1 /* address of string */ 95 : call printf 96 : addl $16,%esp /* stack cleanup after printf */ 97 : 98 : /* return 0 */ 99 : xorl %eax,%eax 100 : jmp .L1 101 : .p2align 4,,7 102 : .L1: 103 : leave 104 : ret 105 : .Lfe1: 106 : .size main,.Lfe1-main 107 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 108 : /* end of unoptimized assembly language code */ 109 : /* --------------------------------------------------------------- */ 110 : 111 : /* --------------------------------------------------------------- */ 112 : /* generated optimized assembly language code */ 113 : .file "test3.c" 114 : .version "01.01" 115 : gcc2_compiled.: 116 : .section .rodata 117 : .LC0: 118 : .string "%d %d" 119 : .LC1: 120 : .string "x = %d, y = %d, z = %d\n" 121 : .text 122 : .align 4 123 : .globl main 124 : .type main,@function 125 : main: 126 : pushl %ebp /* save EBP */ 127 : movl %esp,%ebp /* EBP = ESP */ 128 : subl $8,%esp /* space for just 2 variables on stack */ 129 : 130 : /* scanf("%d %d", &a, &b); */ 131 : leal -4(%ebp),%eax 132 : pushl %eax /* push address of b on stack */ 133 : leal -8(%ebp),%eax 134 : pushl %eax /* push address of a on stack */ 135 : pushl $.LC0 /* address of string */ 136 : call scanf 137 : 138 : /* x = a * b; */ 139 : movl -4(%ebp),%eax /* EAX = b */ 140 : movl %eax,%edx /* EDX = EAX = b */ 141 : imull -8(%ebp),%edx /* EDX = EDX * a = b * a = a * b */ 142 : 143 : addl $12,%esp /* delayed stack cleanup */ 144 : /* if( b >= 4).... */ 145 : cmpl $3,%eax /* compare EAX = b with 3 */ 146 : jle .L17 /* else part from label .L17 */ 147 : 148 : /* y stored in ECX, z in EAX, x in EDX */ 149 : /* y = a * b; */ 150 : movl %edx,%ecx 151 : /* z = 0; */ 152 : xorl %eax,%eax 153 : jmp .L18 /* jump over the else part */ 154 : .p2align 4,,7 155 : .L17: 156 : /* z = a * b * 4; */ 157 : leal 0(,%edx,4),%eax /* LEA EAX, [EDX*4]+0 */ 158 : /* y = 0; */ 159 : xorl %ecx,%ecx 160 : .L18: 161 : pushl %eax /* push value of z */ 162 : pushl %ecx /* push value of y */ 163 : pushl %edx /* push value of x */ 164 : pushl $.LC1 /* push address of string */ 165 : call printf 166 : /* stack cleanup after printf not necessary */ 167 : 168 : /* return 0; */ 169 : xorl %eax,%eax 170 : leave 171 : ret 172 : .Lfe1: 173 : .size main,.Lfe1-main 174 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 175 : /* end optimized assembly code */ 176 : /* -------------------------------------------------------------- */
This program has an example of a common subexpression. On line 10, the expression a * b is evaluated the first time, and then again on lines 14 and 19. The last two evaluations of a * b are redundant since the value of neither a nor b changes after the first evaluation. Thus these two evaluations are common subexpressions that can be eliminated. Lines 30 to 108 show the unoptimized assembly code that is a straightforward translation of the C code and should be easy to understand. Now consider the optimized assembly language code from lines 113 to 176. The first thing to notice is that now only variables a and b are stored on the stack, hence a stack of just 8 bytes is required as opposed to a 20 byte stack for the unoptimized version. You may wonder what's so special about variables a and b. The specialty is that the address of variables a and b is used in the program (for scanf()) and variables that reside in the registers cannot have a memory address. Hence the compiler is unable to put them inside the registers. The registers that the compiler chooses for other variables are as follows: x in EDX, y in ECX and z in EAX. Statements 139 to 141 evaluate a * b and the value is stored in EDX (which holds x). Also, the value of b is available in the register EAX. One more thing to notice is the delayed cleanup of stack after the call to scanf() on line 143. May be there is some advantage in doing so, I don't know.
Next starts the if statement. In the if part, the expression a * b is reevaluated and we expect that the compiler will use the value stored in EDX. That is exactly what the compiler does. For the statement y = a * b, the generated code on line 150 simply copies the value of a * b in register EDX into the register ECX (which holds y). In the else part again, there is a common subexpression in the statement z = a * b * 4. Thus we expect that the compiler will take the value of a * b in EDX, multiply it by 4 and then store the result in EAX (which holds z). Now there are many ways in which this can be done. One is to use a sequence of movl, imull instructions as in
movl %edx, %eax /* EAX = EDX = a * b */ imull $4, %eax /* EAX = EAX * 4 */
The second choice is to use a shift (sall) instead of imull in the above sequence of instructions. However, it turns out that the GCC uses an obscure speed trick to optimize the code. It uses the instruction
leal 0(,%edx,4), %eax
This instruction uses the scaling and indexing capabilities of the 80386 processors. The above instruction takes EDX as the index register, 4 as the scale and 0 as the offset, calculates the effective address and stores it in the EAX register. Thus the register EAX gets a value EDX(=a*b)*4 + 0 i.e. a * b * 4. The leal instruction always executes in 2 clock cycles on an 80386, whereas a simple shift will take around 7 clock cycles. We see that the compiler knows about processor peculiarities and uses it when it generates the code.
Thus it can be seen that common subexpressions save a lot of computations and also space that is used up by the code for these redundant computations. At this stage, one can understand that the compiler when analyzing the program for optimization must keep a track of how and when the variables change, the expressions that are evaluated and also which registers are allocated to variables. In general such a kind of analysis that the compiler performs on the program is called as data flow analysis.
Dead code is the code in the program that will never be executed for any input or other conditions. A common example is an if statement. If the compiler finds out that the condition inside the if is never going to be true, then the body of the if statement will never be executed. In that case, the compiler can completely eliminate this dead code, thus saving the memory space occupied by the code. The following program demonstrates dead code elimination.
1 : /* test4.c */ 2 : /* demonstration of dead code elimination */ 3 : #include <stdio.h> 4 : 5 : int main() 6 : { 7 : int x; 8 : 9 : scanf("%d", &x); 10 : 11 : if(x < 0 && x > 0) 12 : { 13 : x = 99; 14 : printf("Hello. Inside the if!!!"); 15 : } 16 : 17 : return 0; 18 : } 19 : /* end of test4.c */ 20 : 21 : /* --------------------------------------------------------------- */ 22 : /* optimized assembly code */ 23 : .file "test4.c" 24 : .version "01.01" 25 : gcc2_compiled.: 26 : .section .rodata 27 : .LC0: 28 : .string "%d" 29 : .LC1: 30 : .string "Hello. Inside the if!!!" 31 : .text 32 : .align 4 33 : .globl main 34 : .type main,@function 35 : main: 36 : pushl %ebp /* save EBP on stack */ 37 : movl %esp,%ebp /* EBP = ESP */ 38 : subl $4,%esp /* create space for x on the stack */ 39 : 40 : /* scanf("%d", &x); */ 41 : leal -4(%ebp),%eax 42 : pushl %eax /* push address of a on stack */ 43 : pushl $.LC0 /* push string on stack */ 44 : call scanf 45 : 46 : /* the entire body of the if and the condition checking is dead code */ 47 : /* return 0; */ 48 : xorl %eax,%eax /* no stack cleanup, we are exiting anyway */ 49 : leave 50 : ret 51 : .Lfe1: 52 : .size main,.Lfe1-main 53 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 54 : /* end optimized assembly code */ 55 : /* ---------------------------------------------------------------- */
Since the unoptimized assembly language code serves no purpose and also its easy to understand, I have omitted it. In the program, the condition on line 11 is x < 0 && x > 0, which can never be true. The compiler finds this and concludes that the body of the if statement forms dead code, so it does not generate any code for that. One interesting thing to notice is that after the body of if has been eliminated, the string "Hello. Inside the if!!!" is no longer needed and hence can be eliminated from the read-only data section. However, detection of such things probably will increase the complexity of the compiler and hence is not done.
One type of code optimization is strength reduction in which a "costly" operation is replaced by a less expensive one. For example, the evaluation of x2 is much more efficient if we multiply x by x rather than call the exponentiation routine. One place where this optimization can be applied is in loops. Many times in a loop, one variable changes in synchronization with the loop variable. Such a variable is called as induction variable. Induction variables give the compiler a chance to apply strength reduction, as can be seen from the following program.
1 : /* test5.c */ 2 : /* demonstration of induction variable elimination */ 3 : int main() 4 : { 5 : int i, j; 6 : 7 : for(i = 0; i < 10; i++) 8 : { 9 : j = i * 7; 10 : printf("i = %d, j = %d", i, j); 11 : } 12 : return 0; 13 : } 14 : /* end test5.c */ 15 : 16 : /* --------------------------------------------------------------- */ 17 : /* optimized assembly language code */ 18 : .file "test5.c" 19 : .version "01.01" 20 : gcc2_compiled.: 21 : .section .rodata 22 : .LC0: 23 : .string "i = %d, j = %d" 24 : .text 25 : .align 4 26 : .globl main 27 : .type main,@function 28 : main: 29 : pushl %ebp /* save EBP on stack */ 30 : movl %esp,%ebp /* ESP = EBP */ 31 : 32 : pushl %esi /* ESI will hold 'j' */ 33 : pushl %ebx /* EBX will hold 'i' */ 34 : xorl %ebx,%ebx /* both are initialized to zero */ 35 : xorl %esi,%esi 36 : .p2align 4,,7 37 : .L5: 38 : /* printf("i = %d, j = %d", i, j); */ 39 : pushl %esi /* push value of j */ 40 : pushl %ebx /* push value of i */ 41 : pushl $.LC0 /* push address of string */ 42 : call printf 43 : addl $12,%esp /* stack cleanup */ 44 : 45 : /* instead of saying j = i * 7, its efficient to say j = j + 7 */ 46 : addl $7,%esi 47 : incl %ebx /* i++ */ 48 : cmpl $9,%ebx /* is i <= 9, then repeat the loop */ 49 : jle .L5 50 : 51 : /* return 0; */ 52 : xorl %eax,%eax 53 : leal -8(%ebp),%esp 54 : popl %ebx 55 : popl %esi 56 : leave 57 : ret 58 : .Lfe1: 59 : .size main,.Lfe1-main 60 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" 61 : 62 : /* end optimized assembly code */ 63 : /* ----------------------------------------------------------------- */
Here i is the loop variable whereas j is the induction variable as j always changes in lock step with i. In the generated assembly language code, the compiler has decided to use registers to store both i and j with i in EBX and j in ESI. Line 34 initializes EBX(=i) to zero, as is indicated in the initialization part of the for loop. The compiler sees that during the first pass, j will be assigned a value of 0, so it also initialized ESI to 0 in line 35. The loop starts at line 37 and continues till line 49. Since j already has its value assigned to it, the first thing the compiler does inside the loop is to call the printf() function. Here, the compiler has adjusted the order of the statements in the loop so as to enable it to use strength reduction. After analyzing the program, the compiler sees that inside the loop, the value of i is always going to increase by 1 and since j is an induction variable, its value is always going to increase by 7. Therefore, instead of multiplying i by 7 (which is costly), it adds 7 to the old value of j before going on to the next iteration. Thus a costly operation of multiplication has been replaced by addition which is cheaper (in terms of time).
In this article we have seen some very basic code optimization techniques that the GNU C Compiler uses to optimize the generated code. From these optimization techniques and the various pieces of information that are required to apply them, the reader can appreciate the type of sophisticated and complex analysis that the compiler must carry out on the program. It has to keep track of various things like variables, their storage places (memory or registers), expression evaluations, constant expressions, dead code etc. It is because of this high complexity of the process that the compiler takes much longer to compile a program with optimization enabled. There are many details that are involved in code optimization and code generation that I have not explained and that I don't know. Code optimization is a field of active research and interested readers can refer to [1] for additional information.
I would like to thank my Bachelor's project guide Dr. Uday Khedker for creating my interest in compilers and code optimization. I would also like to thank the Linux Gazette, Linux Documentation Project, PC Quest and the Pune Linux User's Group (http://www.pluggies.org) which have earlier accepted and published my contributions that inspires me to write more.