Mr Foo: Why Assembler is better

Following on from my last post, I thought I'd expand a little on how assembler is better than C. On "real" computers, it's often said that a C or C++ compiler will, in the vast majority of cases, produce better code than you can do by hand in assembler. This is largely true, especially when you're dealing with multiple cores and the like.

On microcontrollers, however, and especially 8 / 16 bit ones, the C compilers aren't "all that", and the additional overhead imposed by a C compiler can kill your application stone dead.

Let's take a concrete example.

In my (very non-optimal) assembler example posted before, I need to clear the interrupt pending flag for the Timer interrupt I'm servicing. This is easily done in assembler, it's a one-line, one clock cycle instruction, as follows:

bres        0x5255, #0x00   ; Clear TIM1 Interrupt pending bit

Simple, right? Now, let's look at what the C compiler gives us.

Here's the "C" code we use:

// Clear the interrupt pending bit for TIM1.
TIM1_ClearITPendingBit(TIM1_IT_UPDATE);

Simple enough, right? There's obviously the overhead of a function call, but we might expect the guts of the function to do a simple bit of inline assembler as above. Let's look.

void TIM1_ClearITPendingBit(TIM1_IT_TypeDef TIM1_IT)
{
    /* Check the parameters */
    assert_param(IS_TIM1_IT_OK(TIM1_IT));

    /* Clear the IT pending Bit */
    TIM1->SR1 = (u8)(~(u8)TIM1_IT);
}

So, let's look at what that C code produces.


main.c:64  TIM1_ClearITPendingBit(TIM1_IT_UPDATE); 
0x91c3            0xA601          LD    A,#0x01             LD    A,#0x01 
0x91c5            0xCD8C9C        CALL  0x8c9c              CALL  _TIM1_ClearITPendingBit 

...

stm8s_tim1.c:2156     TIM1->SR1 = (u8)(~(u8)TIM1_IT); 
0x8c9c <.ClearITPendingBit> 0x43            CPL   A                   CPL   A 
0x8c9d <.earITPendingBit+1> 0xC75255        LD    0x5255,A            LD    0x5255,A 
stm8s_tim1.c:2157 } 
0x8ca0 <.earITPendingBit+4> 0x81            RET                       RET

So, we load the accumulator with a value, that's one cycle. Call a function, 4 cycles. Complement the accumulator, one cycle. Store the accumulator in the flag, 1 cycle. Return from function, 4 cycles.

In total, that's 11 cycles and 10 bytes to do the same thing we did in one cycle and 3 bytes. "inlining" this function doesn't make our code any fatter, either, as the 3 bytes we're using are the same as the 3 bytes we would have used to call the function.

What we do lose is readability, but that's easily enough got back by writing macros.

Mr Foo

Sunday, December 6, 2009

Why Assembler is better

No comments:

Post a Comment

Blog Archive

Followers

About Me