Sunday, December 11, 2011

Developing a multitasking OS for ARM part 2

Okay, kids, gather around and we'll carry on where we left off.

Now, we have the ARM booting, jumping to a reset handler, and dropping into an endless loop.  That's a pretty good start.  But really, we'd like to do something more - well - how to put this - "more".

In order to do this, we need to have a bit more understanding about how the ARM itself works.

If we go to the ARMv7AR Architecture Reference Manual (which can be had by registering at, or by downloading a hooky copy off the internets, either approach is feasible, and one at least of which is recommended), we see, in section B1 (the System Level Programmer's Model) a certain amount of interesting reading.  Forget the "privilege" aspect for the moment, and let's skip ahead to section B1.3.

We find that the processor has 8 separate operating modes.  These are:

User Mode, System Mode, Supervisor Mode, Monitor Mode, Abort Mode, Undefined Mode, IRQ Mode, and FIQ Mode

If we look back at the set of vectors we set up earlier, a lot of these "cross over".  So when we drop into the IRQ vector, we will be in IRQ mode.  FIQ, FIQ mode.  Either of the aborts, Abort mode.  Undefined instruction, undefined mode, and so on.  What's interesting is how the machine registers are shared between modes, and particularly the fact that all but system/user modes have their own stack pointers.

Now, when the ARM starts up, it is in SVC mode.  That's the way it is, and you can't change that.  And when it starts up, no stack has been defined.  So you need to be really damned careful in the first bits of the reset code.

Stacks on the ARM grow downwards, so the best thing to do generally is to put them at the top of memory.  As such, a typical reset routine will start by finding out how much memory is available, then setting up stack pointers for each of the operating modes.  We're nothing if not typical, so let's look at how we do that.

First thing - sizing memory.  On the versatile baseboard as emulated by qemu, this is easy.  We try writing to a bit of memory, then read back - if the value is set, there's memory there, if there's not, then we are above the top of memory.  It's not quite so simple on the Pi, as trying to write outside of physical RAM will cause an exception.  However, we're going to be a bit clever, and try to kill 2 birds with one stone.

Firstly, we need to set up some big fat global variables.

.global __memtop
__memtop: .word 0x00400000 /* Start checking memory from 4MB */
.global __system_ram
__system_ram: .word 0x00000000 /* System memory in MB */
.global __heap_start
__heap_start: .word __bss_end__ /* Start of the dynamic heap */
.global __heap_top
__heap_top: .word __bss_end__ /* Current end of dynamic heap */

__bss_end__ is set up by the linker, and it would be much better of me to use that for the initial value of __memtop (rounded up to the nearest megabyte) as well.  But hey, I'm lazy.  It'll come back to bite me later, I'm sure.

Now, as the Pi causes an exception on writes outside memory, we need to patch in a handler, temporarily.  Here's the handler:

/* temporary data abort handler that sets r4 to zero */
/* this will force the "normal" check to work in the */
/* case (as, I believe, on RasPi) where access 'out  */
/* of bounds' causes a page fault                    */
mov r4, #0x00000000
sub lr, lr, #0x08
movs pc, lr

Note how the comment indicates I'm not absolutely sure this will work.  This is, frankly, because I'm not sure if this will work on a real Pi, and nobody wants to let me get my hands on one.  Still, let's pretend, eh?

/* This tries to work out how much memory we have available */
/* Should work on both Pi and qemu targets */
FUNC _size_memory
/* patch in temporary fault handler */
ldr r5, =.Ldaha
ldr r5, [r5]
ldr r6, [r5]
ldr r7, =temp_abort_handler
str r7, [r5] 
DMB r12

/* Try and work out how much memory we have */
ldr r0, .Lmemtop
ldr r1, .Lmem_page_size
ldr r1, [r1]
ldr r2, .Lsystem_ram
ldr r3, [r0]
add r3, r3, #0x04
str r3, [r3] /* Try and store a value above current __memtop */
DMB r12 /* Data memory barrier, in case */
ldr r4, [r3] /* Test if it stored */
cmp r3, r4 /* Did it work? */
bne .Lmem_done
ldr r3, [r0]
add r3, r3, r1 /* Add block size onto __memtop and try again */
str r3, [r0]
b .Lmem_check
ldr r3, [r0] /* get final memory size */
lsr r3, #0x14 /* Get number of megabytes */
str r3, [r2] /* And store it */
/* unpatch handlers */
str r6, [r5]
DMB r12

bx lr

.extern __memtop
.word __memtop

.extern __mem_page_size
.word __mem_page_size

.extern __system_ram
.word __system_ram

.extern data_abort_handler_address
.word data_abort_handler_address

We see a few things here.  Firstly, how to patch in and out the handler.  Also, that I've got fed up with doing the whole .code 32; .global foo; foo: rigmarole and defined a macro called FUNC.  We also see a macro called DMB, which implements the ARMv6 Data Memory Barrier (ARMv7 has a 'dmb' instruction, to do that, we don't).  For what it's worth, these are the macros:

.macro FUNC name
.code 32
.global \name

/* Data memory barrier */
/* pass in a spare register */
.macro DMB reg
mov \reg, #0
mcr p15,0,\reg,c7,c10,5 /* Data memory barrier on ARMv6 */

So, we can hopefully now find out how much memory we have, with __memtop containing the actual top of memory and __system_ram containing the number of megabytes in case it's useful to know.

So let's look at the start of _reset...

.equ MODE_BITS,   0x1F /* Bit mask for mode bits in CPSR */
.equ USR_MODE,    0x10 /* User mode */
.equ FIQ_MODE,    0x11 /* Fast Interrupt Request mode */
.equ IRQ_MODE,    0x12 /* Interrupt Request mode */
.equ SVC_MODE,    0x13 /* Supervisor mode */
.equ ABT_MODE,    0x17 /* Abort mode */
.equ UND_MODE,    0x1B /* Undefined Instruction mode */
.equ SYS_MODE,    0x1F /* System mode */

FUNC _reset
/* Do any hardware intialisation that absolutely must be done first */
/* No stack set up at this point - be careful */
ldr r0, =.Lsize_memory
ldr r0, [r0]
cmp r0, #0
blxne r0

/* Assume that at this point, __memtop and __system_ram are populated
/* Let's get on with initialising our stacks */
mrs r0, cpsr /* Original PSR value */
ldr r1, __memtop /* Top of memory */

bic r0, r0, #MODE_BITS /* Clear the mode bits */
orr r0, r0, #IRQ_MODE /* Set IRQ mode bits */
msr cpsr_c, r0 /* Change the mode */
mov sp, r1 /* End of IRQ_STACK */
/* Subtract IRQ stack size */
ldr r2, __irq_stack_size
sbc r1, r1, r2

bic    r0, r0, #MODE_BITS /* Clear the mode bits */
orr    r0, r0, #SYS_MODE /* Set SYS mode bits */
msr    cpsr_c, r0 /* Change the mode   */
mov    sp, r1 /* End of SYS_STACK  */
/* Subtract SYS stack size */
ldr r2, __sys_stack_size
sbc r1, r1, r2

bic    r0, r0, #MODE_BITS /* Clear the mode bits */
orr    r0, r0, #FIQ_MODE /* Set FIQ mode bits */
msr    cpsr_c, r0 /* Change the mode   */
mov    sp, r1 /* End of FIQ_STACK  */
/* Subtract FIQ stack size */
ldr r2, __fiq_stack_size
sbc r1, r1, r2

bic    r0, r0, #MODE_BITS /* Clear the mode bits */
orr    r0, r0, #SVC_MODE /* Set Supervisor mode bits */
msr    cpsr_c, r0 /* Change the mode */
mov    sp, r1 /* End of stack */
/* And finally subtract Kernel stack size to get final __memtop */
ldr r2, __svc_stack_size
sbc r1, r1, r2
str r1, __memtop
/*-- Leave core in SVC mode ! */
/* Zero the memory in the .bss section.  */
mov a2, #0 /* Second arg: fill value */
mov fp, a2 /* Null frame pointer */
ldr a1, .Lbss_start /* First arg: start of memory block */
ldr a3, .Lbss_end
sub a3, a3, a1 /* Third arg: length of block */
bl memset

ldr r2, .Lc_entry /* Let C coder have at initialisation */
        mov     lr, pc
        bx      r2

cpsie i /* enable irq */
cpsie f /* and fiq */

/* Initialisation done, sleep */
ldr r2, .Lsleep
        mov     lr, pc
        bx      r2

.Lbss_start: .word __bss_start__
.Lbss_end: .word __bss_end__
.Lc_entry: .word c_entry
.Lsleep: .word sys_sleep

Note the use of msr cpsr_c, rx - this is how we change mode.  We can change mode this way from any mode except user mode.  Luckily, the user mode stack pointer is shared with system mode, so we don't need to drop into user mode at all.  So we go off, find how much memory we have, then for certain of the operating modes, we set up a stack pointer.  We then use a pre-written implementation of memset() to zero out the bss section, let the 'c' code have a go at initialising its stuff via c_entry(), turn on interrupts, and go to sleep via sys_sleep().

Next up, how we go about doing useful work...

No comments:

Post a Comment