Monday, July 9, 2012

Atomic setting on ARM

Back to our scheduled programming.  Big gobs of ugly assembly code.

When implementing a multitasking operating system, there is a need for dealing with mutating shared state.  It's one of the unpleasant realities of the world of multitasking (and, indeed, multithreading), and it's fraught with danger.

One of the basic functions we need to implement is the "atomic set" operation - setting a variable to a given value.

Let's take the example of incrementing a variable.  Naively, we might do this:

    ldr r0, [r1]
    add r0, r0, #1
    str r0, [r1]


In most cases, this will work.  However, there exists a case where, in fact, this can fail - if two threads of execution are trying to increment the value, and a task swap happens whilst one is actually doing the increment, the possibility is that the value will be (incorrectly) incremented only once, as follows:

; Assume r1 points at a given address, holding the value 0
[thread 1]
    ldr r0, [r1]  ; r0 in thread 1 is now 0

[thread 2]
    ldr r0, [r1] ; r0 in thread 2 is now 0
    add r0, r0, #1 ; r0 in thread 2 is now 1
    str r0, [r1] ; memory is now 1
...

[thread 1]
    add r0, r0, #1 ; r0 in thread 1 is now 1
    str r0, [r1] ; memory is now 1


Obviously, we would expect memory to be set to 2, not to 1.  So somehow we need to either stop the interrupts happening (easy enough, turn interrupts off, but that has fairly big impacts elsewhere) or somehow deal with the case where we are interrupted mid-operation.

As luck would have it, ARMv6 provides us with 3 handy opcodes for this : ldrex, strex and clrex.  Basically, we use ldrex to signal that we want to have exclusive write access to a memory location, strex signals that we're going to write to a location and close that exclusive access, with a test for success, and clrex says "hey, we're no longer interested".  So, how do we use these to do what we want?

Let's go back to our example above - incrementing a value in memory.  Using ldrex / strex it would look like this:

try_increment:
    ldrex r0, [r1]
    add r0, r0, #1
    strex r2, r0, r1
    cmp r2, #0
    bne try_increment

What happens here is:


  • the initial ldrex loads the memory, and indicates that it wants an exclusive access to the memory itself.
  • We then increment our value, as usual.
  • We write the value back using strex - this will only succeed if:

  1. we still have an exclusive lock on the memory
  2. no newer writes to that memory have happened since we established our exclusive lock
  • success of strex is indicated by register r2 (the "extra" operand that strex uses) being set to 0.
  • If strex has failed, we go back and try again from the point where we loaded the initial value.

For our super-simple increment case, this will probably catch 99.99% of cases.  We add a "belt-and-braces" approach, however, by making our task scheduler explicitly invalidate all exclusive reads, using the clrex opcode.  This has the possibility of making any in-process ldrex-strex blocks restart (and thus take more time), but covers all the bases.

Now, that's all fine and well, but we may want to use this method in our C code, without resorting to subroutine calls (by their very nature, exclusive operations happen at a very low level, and probably want to be inlined).  So we're going to want to use some of that nastiest of nasties, inline assembler in gcc.  Believe me, it's vile.

Here's an implementation of an inline atomic increment using gcc inline assembler:

inline uint32_t atomic_increment(uint32_t * memory) {
  uint32_t temp1, temp2;
  __asm__ __volatile__ (
    "1:\n"
    "\tldrex\t%[t1],[%[m]]\n"
    "\tadd\t%[t1],%[t1],#1\n"
    "\tstrex\t%[t2],%[t1],[%[m]]\n"
    "\tcmp\t%[t2],#0\n"
    "\tbne\t1b"
    : [t1] "=&r" (temp1), [t2] "=&r" (temp2)
    : [m] "r" (memory)
  );
  return temp1;
}


Horrible, no?  Note the use of local labels ('1:' and then branching to '1b' to indicate the latest local label called '1'), having to use encoded tabs and newlines to stop the assembler itself barfing, and the horrible workaround of multiple names for the same variables because gcc is, quite simply, broken.  Still, it works, and the C optimiser can deal with it.


If you want to get more complex, I'd suggest looking at the ARM site for the example implementations of mutexes and condition variables using ldrex/strex.  You'll have to deal with converting from ARM assembler to GNU, but as long as you don't try inlining them, you should be fine.

Followers