Mr Foo: Atomic setting on ARM

Monday, July 9, 2012

Atomic setting on ARM

Back to our scheduled programming. Big gobs of ugly assembly code.

When implementing a multitasking operating system, there is a need for dealing with mutating shared state. It's one of the unpleasant realities of the world of multitasking (and, indeed, multithreading), and it's fraught with danger.

One of the basic functions we need to implement is the "atomic set" operation - setting a variable to a given value.

Let's take the example of incrementing a variable. Naively, we might do this:


    ldr r0, [r1]

    add r0, r0, #1

    str r0, [r1]

In most cases, this will work. However, there exists a case where, in fact, this can fail - if two threads of execution are trying to increment the value, and a task swap happens whilst one is actually doing the increment, the possibility is that the value will be (incorrectly) incremented only once, as follows:


; Assume r1 points at a given address, holding the value 0

[thread 1]

    ldr r0, [r1]  ; r0 in thread 1 is now 0



[thread 2]

    ldr r0, [r1] ; r0 in thread 2 is now 0

    add r0, r0, #1 ; r0 in thread 2 is now 1

    str r0, [r1] ; memory is now 1

...



[thread 1]

    add r0, r0, #1 ; r0 in thread 1 is now 1

    str r0, [r1] ; memory is now 1

Obviously, we would expect memory to be set to 2, not to 1.  So somehow we need to either stop the interrupts happening (easy enough, turn interrupts off, but that has fairly big impacts elsewhere) or somehow deal with the case where we are interrupted mid-operation. 



As luck would have it, ARMv6 provides us with 3 handy opcodes for this : ldrex, strex and clrex.  Basically, we use ldrex to signal that we want to have exclusive write access to a memory location, strex signals that we're going to write to a location and close that exclusive access, with a test for success, and clrex says "hey, we're no longer interested".  So, how do we use these to do what we want?



Let's go back to our example above - incrementing a value in memory.  Using ldrex / strex it would look like this:



try_increment:

    ldrex r0, [r1]

    add r0, r0, #1

    strex r2, r0, r1

    cmp r2, #0

    bne try_increment




What happens here is:

the initial ldrex loads the memory, and indicates that it wants an exclusive access to the memory itself.
We then increment our value, as usual.
We write the value back using strex - this will only succeed if:

we still have an exclusive lock on the memory

no newer writes to that memory have happened since we established our exclusive lock

success of strex is indicated by register r2 (the "extra" operand that strex uses) being set to 0.
If strex has failed, we go back and try again from the point where we loaded the initial value.

For our super-simple increment case, this will probably catch 99.99% of cases. We add a "belt-and-braces" approach, however, by making our task scheduler explicitly invalidate all exclusive reads, using the clrex opcode. This has the possibility of making any in-process ldrex-strex blocks restart (and thus take more time), but covers all the bases.

Now, that's all fine and well, but we may want to use this method in our C code, without resorting to subroutine calls (by their very nature, exclusive operations happen at a very low level, and probably want to be inlined). So we're going to want to use some of that nastiest of nasties, inline assembler in gcc. Believe me, it's vile.

Here's an implementation of an inline atomic increment using gcc inline assembler:


inline uint32_t atomic_increment(uint32_t * memory) {

  uint32_t temp1, temp2;

  __asm__ __volatile__ (

    "1:\n"

    "\tldrex\t%[t1],[%[m]]\n"

    "\tadd\t%[t1],%[t1],#1\n"

    "\tstrex\t%[t2],%[t1],[%[m]]\n"

    "\tcmp\t%[t2],#0\n"

    "\tbne\t1b"

    : [t1] "=&r" (temp1), [t2] "=&r" (temp2)

    : [m] "r" (memory)

  );

  return temp1;

}

Horrible, no?  Note the use of local labels ('1:' and then branching to '1b' to indicate the latest local label called '1'), having to use encoded tabs and newlines to stop the assembler itself barfing, and the horrible workaround of multiple names for the same variables because gcc is, quite simply, broken.  Still, it works, and the C optimiser can deal with it.




If you want to get more complex, I'd suggest looking at the ARM site for the example implementations of mutexes and condition variables using ldrex/strex.  You'll have to deal with converting from ARM assembler to GNU, but as long as you don't try inlining them, you should be fine.

8 comments:

CyCl0neJuly 18, 2012 at 1:55 PM
Hmm, it sounds so complex with the clrex command. What do you mean you clear it in the task schedule, if i use the atomic_add for semaphores, do the task schedule check all semaphore and clear them? sounds a bit of work then to do? or didnt i get it?`:-)
ReplyDelete
Replies
Felicity PilchardMay 27, 2013 at 3:39 PM
I agree with Claus, this post makes it seem complex. And buggy - when you say ldrex/strex covers 99.99% of cases in this super-simple example, what happens with the remaining .01%? How do they fail, and how does clrex fix the problem? And if ldrex/strex sometimes fails when the case is this simple, does it fail more often or more horribly with more complicated cases?
ReplyDelete
Replies
KonzNovember 21, 2013 at 4:45 PM
The code would be easier to read if you replaced \t with spaces or real tabs. Also, you can reduce the number of double-quotes by continuing the line with a final backslash. At the assembly level, you should look at cbnz, this replaces your last two lines. Finally, I don't think you actually compiled this code: the extra set of square brackets around the memory operands are wrong.
ReplyDelete
Replies
tuftyMay 5, 2014 at 4:07 PM
Konz

The code compiles fine. The reason for not using cbnz, by the way, is that it's specific to the Thumb instruction set.
ReplyDelete
Replies
deskwarriorMarch 6, 2015 at 6:25 PM
Found a better way: the entire thing may be written as

return __sync_add_and_fetch( memory, 1);

This is a compiler builtin (supported on gcc and clang, at least) which on ARM will be the same "ldrex add 1 strex until !=0" construct. And it's portable; on x86 it's done with a locked 'xaddl' instruction.

See https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
ReplyDelete
Replies

Mr Foo

Monday, July 9, 2012

Atomic setting on ARM

8 comments:

Blog Archive

Followers

About Me