Tweak Your Code For Speed: Unroll Those Loops Part 1
Subscribe To Our Feed | Follow Us On Twitter | Get Updates on Email
One of the ways to optimize your code for speed is to unroll the loops you have in your program. If you don’t know what loop unrolling is, then see the following simple example, where we are trying to copy a 5 element long string:
A Simple Loop:
for (i = 0 ; i < 5; ++i) { *j++ = *k++; }
The same loop unrolled:
*j++ = *k++; *j++ = *k++; *j++ = *k++; *j++ = *k++; *j++ = *k++;
Now, the unrolled version is definitely going to be faster because of the following reasons:
1. There are no branches any more. In a loop, the compiler has to insert branches to jump back to the for statement.
2. There are no condition checks. In a loop, the value of i is checked every time to see what to do next.
3. A third rather hidden, but really important, advantage is that unrolled code can be pipelined efficiently by the processor hence resulting in faster execution.
The above code in itself might not show you any perceivable difference in execution time, but it does become crucial when the loops are of much higher magnitude and especially in embedded/real time scenarios.
Now, the loop unrolling can be done by the compiler or manually by the coder. In this part, we will see how to facilitate compiler to carry out loop unrolling efficiently.
1. Turn on your compiler’s optimization switches.
2. Find out if there are any pragmas available through which you can pass additional information to the compiler to help it in unrolling the loops. This information could be in the form of minimum no. of times the loop will be executed, maximum number of times the loop will be executed, is the number of times the loop will be executed a multiple of some particular number, etc. How this information helps the compiler will become clear in the next part of this series when we will tell you how to manually unroll the loops.
3. Try to keep the loop implementation in such a way that the variables written in one iteration are not read in the future iterations. e.g. in the above example, the variable “i” should not be modified within the loop. This would ensure that any given iteration of a loop is independent of the past values of i and hence can be easily pipelined.
4. Try to take out epilog and prolog from the loop. This refers to the statements which are not executed on every iteration of the loop but only at the starting or at the end of the loop. e.g. see the following example:
for(i = 0; i < 5; ++i) { if(i == 0) { *j++ = *w; continue; } *j++ = *k++; }
In this code, when i is 0, we carry out some special code. This could be rewritten as:
*j++ = *w; for(i = 0; i < 4; ++i) { *j++ = *k++; }
5. Keep the loop body small, if you want to reap the benefits of pipelining. Many people might tell you to group the code that is executed same number of times into a single loop. But I’d say that if the 2 piecs of code are unrelated, it is better to keep them separate in 2 different loops in your c code. Because this will not affect the compiler from unrolling them (rather might help it in doing this, actually) but also allow it pipeline your code efficiently.
6. Try not to have external branches (like goto, function calls, etc) in your loop body. This also helps the compiler in pipelining the code.
7. The last advice is to avoid nested loops. Because in nested loops, the compiler will unroll only the outer loops, leaving the inner loops untouched. If you can’t avoid such conditions, it is generally recommended that you unroll the inner loops manually.
This is it, for this time. We will talk about manual loop unrolling next week. Let us know your queries and comments, and also about what topics should would like to know about next.
© Safer Code | Tweak Your Code For Speed: Unroll Those Loops Part 1
|
Liked this post? Get FREE Updates Subscribe to RSS feed |
Related posts
Tags: C, iterations, Loop Unrolling, Loops, Optimization, Optimization techniques, Pipelining






When we talk about micro optimzation, I guess it would also help if the example loop counted backwards. Saves another compare (and often a register for storing the limit) if the compiler decides not to unroll a loop.
@marcus:
Just curious… How does a loop counting backwards save a compare?
@agi
> How does a loop counting backwards save a compare?
After every iteration, an up-counting loop will have to compare the loop variable with the loop limit. Pseudo code:
loop ... ADD i, #1 CMP i, BNE loopA down-counting loop decrements the loop variable. On most processor architectures, this will implicitly set condition code flags. The flags can be directly evaluated by a conditional branch. Plus there is no need to store the limit after initializing the loop variable.
loop ... SUB i, #1 BNZ loopOf course, with trivial loops as in the examples you can expect that the compiler actually rewrites your code to count towards zero, especially if the counter isn’t used inside of the loop. If it’s also used as an index into an array for example, the compiler can’t do that optimization for you.
Unrolling always may not help. One important demerit of unrolling that is not mentioned is that code size may get greatly impacted (would become larger that is).
The gcc optimization flag -O2 enables all optimizations but loop unrolling, because it does not want to optimize at the cost of code size.
And regarding point 7, are you sure that the compiler _always_ skips unrolling inner loops?
If a concoct a trivial example such as:
for (i = 0; i < 3; ++i)
for (j = 0; j < 2; ++j)
;
..would the compiler still unroll only the outer loop?