Tuesday, January 31, 2012

CAR, CDR, CONS - an aside

Commenter pcpete has been following the ARM OS stuff, but was having some trouble with my CAR/CDR/etc macros.  I figured I'd do a little writeup for those who aren't totally au fait with what that's all about.


CAR, CDR and CONS are terminology which has stuck around since the '50s.  It's not going away, so if you want to Lisp, you need to get used to it.


The original Lisp was implemented on the IBM 704 computer.  This was a machine which had a 36 bit word (peculiar, but useful) and a 15 bit address bus. 36 bit words could be split into 4 parts (and hardware support was provided for doing so) known as the address (a 15 bit value), the decrement (again, 15 bits), the tag (3 bits) and the flags (3 bits).  The very earliest work on Lisp provided operators to extract these 4 parts (Contents of Address part of Register or CAR,  and likewise for Decrement, Tag and Flags, providing CDR, CTR and CFR).  Later work (before the first implementation of Lisp) dropped direct manipulation of flags and tag, leaving CAR and CDR, and a "Construct" operator, CONS, which took values for the address and decrement parts respectively and stuffed them into a machine word.

The fact that a machine word was bigger than the size of the address of a machine word meant that it was possible to implement pairs of values and singly linked list cells within one machine word.  A pair of values (for example, 1 & 2) could be created as follows:

CONS(1, 2)

and singly linked lists by treating the 'address' part (paradoxically enough) as a value, and the 'decrement' part as the address of the next cell in the list.  By setting the last pointer to some easily-recognised value (known as 'nil'), it is possible to find the end of a list.  Thus, creating a list containing the numbers 1, 2 and 3, looks like this:


CONS(1, CONS(2, CONS(3, nil)))

Binary trees are also easy to construct using pairs - for example:

CONS(CONS(1, 2), CONS(3, 4))


I'm generally the last one to point people at Wikipedia, but there's a reasonable and rather more graphical explanation of what have become known as 'cons cells' over there


It's worth noting that Lisp also allows you to use the notation 'first' and 'rest' instead of 'car' and 'cdr', although this only really makes sense when in list context, and doesn't allow for composition (caar, cadr, cdar, etc).  Pretty much nobody uses first and rest.


So, Lisp uses what is, in effect, a pair (or list) oriented memory model as opposed to the more usual (these days) approach of addressing single memory cells individually.  And if one is implementing a Lisp (as I am), it makes a certain amount of sense for the low level memory allocation and manipulation functions to operate in this way.  Unfortunately, "modern" machines in general don't have words that are larger than their address bus size, and ARM is no exception to this.  ARM is a 32 bit machine - we could restrict addressing within our lisp to 16 bits (maximum 64Kwords or 256KB) and all values to 16 bits, or, more sensibly, use "virtual" words of 2 machine words (or maybe more) in size.  Usefully, ARM has LDRD (Load Register Dual) and STRD (Store Register Dual) operands that make this a snip to do.


So, we can define cons cells as 8-byte aligned pairs of 32-bit values, manipulate them as pairs using LDRD/STRD, and we're a long way towards implementing a Lisp.  


But what of the Flags and Tag stuff that was originally in Lisp?  Although, by the time Lisp came out, you had no way of directly manipulating them, they were still used to differentiate between pointers, direct values, and constants, amongst other things.  And this is something we need to be able to do as well, otherwise we can't tell if (for example) the value 0x00000000 in the CDR of a cell is a pointer to a cell at the address 0x00000000, the integer value zero, or some special value indicating nil/end of list.


As it happens, all is well. LDRD/STRD require an 8-byte aligned address, which means the low 3 bits of a pointer will *always* be 0.  We can, therefore, use those 3 bits as flags to indicate various other types, and, if we're careful, we can encode the majority of types in such a way as to keep their values "boxed" (i.e. containable in a 32 bit value).  Values requiring more than 32 bits to encode (strings, large numbers, rational numbers, etc) can either be encoded as lists of boxed values, pairs, or using any other encoding the programmer deems useful as long as they are easily recognisable.


So, back to the code I posted, which uses CAR, CDR, SET_CAR and SET_CDR macros.



/* linked list structure */
typedef struct task_list {
  task_t * _car;
  struct task_list * _cdr;
} task_list_t;


/* some LISPy primitives */
#define CAR(x) (x)->_car
#define CDR(x) (x)->_cdr

#define SET_CAR(x,y) CAR(x)=(y)
#define SET_CDR(x,y) CDR(x)=(y)


As we can see, task_list_t is an analogue for the cons cell, being simply 2 address-bus sized values held together.  Indeed, my code elsewhere looks like this:


typedef cons_t task_list_t;

but the original formulation is perhaps easier to grok at first.


So, given a pointer x to a cons cell c, we can see that CAR(x) expands to(x)->_car, in other words returning the _car element of the cell c pointed to by x.  Likewise SET_CAR(x, y) becomes (x)->_car = (y), assigning the value y to the _car element of the cell c pointed to by x. Doing this as macros is probably premature optimisation, but it's "the done thing".


Our task scheduler uses a list (actually, a plurality of lists) of tasks to execute, and considers them for execution in a "round robin" style.  So, assuming we have 4 tasks, a, b, c and d, we might have a list that looks, in lisp syntax, like this:


(a b c d)


or, in diagram form, where the top row is the CDR of the cell, and the bottom the CAR, and * indicates a pointer to the next cell


* - * - * - nil
|   |   |   |
a   b   c   d


We need to rotate the list as we consider each task, so that the next time through we consider the next task. So after considering task a for execution, we want the list to look like this:


(b c d a)



or, in diagram form


* - * - * - nil
|   |   |   |
b   c   d   a


In true lisp style, we'll use a singly linked list for this, with the CAR of each cell being a pointer to the task, and the CDR being a pointer to the next cell, or nil for end of list.  


So, we come in, and we find the CAR and CDR of the (original) list.  CAR is, obviously enough, task 'a', the task we want to consider for execution.  CDR, however, is not task b, but the rest of the list (hence the alternative 'rest' in Lisp) - i.e.


(b c d)



or, in diagram form


* - * - nil
|   |   |
b   c   d


So our next step is to put 'a' onto the end of the list.  This is not as simple as it seems - CONS can only be used to push items onto the front of a list - CONS(list, a) would result in this:


((bcd) . a)



or, in diagram form


a
|
* - * - nil
|   |   |
b   c   d


which is not a true list in Lisp terms (it's a "dotted pair" with a list as its first element). We need some sort of APPEND operator that adds an element to the end of a list, a sort of 'reverse cons'.  Lisp has one of those, it's (oddly enough) called APPEND, it's usually defined recursively, and it always creates new cons cells all over the place.  We really don't want to go allocating new memory in a function that's going to be called thousands of times per second simply to move a few values about - we want to keep the same cons cells and simply swap their CDR values around.  The algorithm thus becomes:


first := list
if CDR(first) != NIL
  list, end := CDR(first)
  end := CDR(end) while CDR(end) != NIL
  SET_CDR(first, NIL)
  SET_CDR(end, first)


With that in mind, have another read through http://stm8sdiscovery.blogspot.com/2011/12/developing-multitasking-os-for-arm-part_14.html


Simon

1 comment:

Followers