Saturday, May 20, 2017

Mecrisp Stellaris on Maple Mini clone

Blimey. Been a while since I posted anything, and I'm mainly posting this as a reminder to myself.

Setting up Maple Mini clones (Baite logo) to run mecrisp-stellaris.

Binary from https://github.com/jeelabs/embello/tree/master/explore/1608-forth/suf

The built-in maple bootloader and dfu-util are mostly sufficient and easy to use for uploading stuff to the maple mini, but are not sufficient for uploading mecrisp.  Mecrisp needs to load at 0x0, and thus kills the maple bootloader.  You need stm32loader.py or similar, and upload using the ST bootloader built into the chip itself.

Setup :

- 3.3v serial port connected to rx1/tx1 (and, in my case, powering the board via vcc and gnd).
- jumper from boot1 to gnd
- hold the user button down, and power up.
- keep the user button held down, I think.  I did, anyway, but you may not need to.

You can now use stm32loader.py as follows

stm32loader.py -p <serial-port-here> -e -w -v -a 0x08000000 usb-maplemini.bin

And that will get you mecrisp stellaris installed on the maple mini communicating over the usb port.  That's pretty fucking nifty, right there.

Saturday, July 5, 2014

Osprey backpacks



Well, here's a thing.  More of a rant, actually.

As some of you may know, I live in the French Alps, and work as a chairlift driver.  When I'm not working, coding, or playing ice hockey, I can often be found trekking up and down mountains.  Backpacks, and particularly technical backpacks, are a pretty large part of my life.  I've got a good few of them, and use them almost daily.

So, anyway, my old 30L pack has died.  My last trip involved 3 days with the damned thing held together with bits of string.  Not fun, especially as it was mostly raining, so my stuff was getting wet.  Time to replace.  List of requirements:


  • 30L ±
  • Ventilated back (gets hot around here in the summer)
  • Integrated rain cover
  • Hydration system


Now, I'm a fan of Osprey packs, and have been since I bought a "Variant" 52 litre pack.  Well designed, comfortable, and built to last.  So the first place I started looking was Osprey.  I ended up looking seriously at Osprey's "Manta 28" pack, which had pretty much everything I wanted,  a few things I hadn't thought of, and one thing I didn't really want but could live with (it's a framed pack).  It's sold as a "hydraulics" hydration pack.  All the reviews mention the included 3 litre hydration sleeve as being one of the best on the market.  The goddamned Osprey website mentions its "Internal Hydration Sleeve".  So, when I ordered one (not direct from Osprey, but through an Amazon seller, tight-ass that i am) for 85 euros, I was expecting to find a hydration sleeve in the pack or its packaging.

I was disappointed.

"Hrm", I thought.  "The seller must have forgotten it.  Or maybe 'forgotten' it".  So I contacted them.  

"Osprey don't include the hydration sleeve in Europe.  We can sell you one for 35 euros plus shipping"

What the fuck?  Osprey's official Eurozone price of 100 euros is very close to the US-market price.  Nothing on the Osprey europe site indicates that the hydration sleeve isn't included.  "Maybe I've misread", I thought, and went off to compare between osprey-europe.com and ospreypacks.com.  Problem - ospreypacks.com redirects to osprey-europe.com immediately.  In order to hit ospreypacks.com without a redirect, I eventually had to use a US-based proxy.

Keep that in mind.  A european client cannot hit ospreypacks.com without taking specific and non-obvious technical steps to do so.  Which is a shame, because on the ospreypacks.com page for the Manta 28, we find the following, in bold italic text :

Reservoir sold separately in European and Asian markets

Yep.  The one thing a customer from Europe might want to know, before ordering, about this pack, is not available to them.  A bit more "spot the difference" gives us the following 2 images from ospreypacks.com on the left, and from ospreyeurope.com on the right.  Can you see the difference? 


That's right. The ospreyeurope.com version has had the visible bit of hydration tube photoshopped out. Well, nearly all of it, anyway - there's still a bit visible on the right shoulder strap. Beyond that, and the fact that the euro site doesn't explicitly say "Hydration system" as opposed to "internal hydration sleeve", which latter apparently means the place in the pack where you'd put an - ahem - hydration sleeve, there's bugger all difference.

So I've got a new pack (and, if I might say so, a bloody good one), but I'm feeling ripped off.  Because my "Hydraulics" "hydration pack" has a 75cl bottle of water in each side pocket, rather than the 3l hydration pack I was supposed to have.

Adding in the supposedly supplied 3l sleeve is going to cost me half what I paid for the pack itself, and bring the price up to twice that of a more or less equivalent pack with a separately bought sleeve.

So FUCK YOU, Osprey.  You've lost a client.   Yeah, I know.  First world problems.  Call a whaaaambulance.  But fuck you anyway.

Saturday, April 19, 2014

What the *fuck* are Microsoft smoking?

So, seeing as I'm blogging again, here's a bit of a rant.

Boy #1, who is now 14, decided that his computer, which had been happily running some form of Ubuntu, wasn't up to scratch.  Specifically, that he couldn't get in on the F2P revolution - his friends had moved on from Minecraft to League of Legends.  As such, he *needed* Windows.

"Okay", say I.  I have a valid license for Win7 Pro that I don't use (came with my new laptop, which is *ahem* hackintoshed *ahem*), you can use that.  But beyond doing the install and owning the administrator password, as they say around here, "demerdez-vous".  I deal with Macs and unix boxes, I know nothing much about Windows apart from "when I have to use it, I don't like it".  In short, I'm willing to set him up a machine, and occasionally bung in the administrator password, but it it gets pwnt, or stops working, or generally starts buggering about, the response is going to be "Reinstall.  From scratch."

Now, as he's 14, computer time is restricted.  With Ubuntu, I'd installed timekpr, which is an impressively flexible tool for managing computer time usage.

Windows 7 comes with "parental controls", which purport to do something similar.  Well, on the surface, it looks like they do, anyway.  Once you've scratched the surface, it becomes patently clear that Microsoft's parental controls are seriously lacking.

Time usage controls come down to "user x can use the computer between hour y and hour z".

That's it.  Nothing more.  Want to allow your kid to use the computer between 10am and midday, and from 6 to 8 pm, with a maximum screen time of 2 hours?  No, can't do that.  Even better is Microsoft's response to its customers.  It's "brillant", for those who remember The Daily WTF :

Only administrators can set the parental control and not the standard user. So if you wnat to set the time frame of 3 hours per day usage, you will have to ask the user first as to when they want to use the computer and set it under parental controls.

http://answers.microsoft.com/en-us/windows/forum/windows_vista-security/parental-control-time-limit-question/98543dfa-783b-4ae8-b464-c274bce895ec

This shows a(nother) complete misunderstanding of what people might actually want to do with their computers.

So, anyway, parental controls are out.  They suck, and MS don't seem to understand why, or care enough about their clients to try and improve them.  Which leaves 3rd party solutions.

The only one I've found which doesn't suck (indeed, it's better than timekpr, which I considered to be pretty damned good) is timeboss, from http://nicekit.com

Time limits, times when you can't use the computer at all, forced breaks, it's got everything you need to properly restrict usage.

The *only* downside to it is that, even allowing for the massive flexibility it provides, the UI is over-confusing and cluttered.

It's an enormously powerful piece of software, reasonably priced for the peace of mind it provides (although it should be said that no piece of software can replace proper parental oversight); Microsoft would do well to ask themselves why companies like nicekit are making money providing something that should be standard.

Still, if you're a parent who, like most of the world, is tied to Windows, it's definitely software to have.

Friday, April 18, 2014

Intuos tablet resurrection - the story, and the gory bits.

So, I'm going to tell you a story. Are you sitting comfortably? Then I'll begin...

A long time ago, in a land far, far away (well, in the year 1999 or so, and on holiday in Paris), I bought myself a Wacom tablet. An ADB Wacom tablet to go with the Mac IIfx I was using at the time. Or it might have been the G3 desktop, thinking about it. Whatever, it was a ripsnortin' fast Mac with several screens attached and an ADB connector. The tablet was, as I recall, a UD-0608-A, and I bought it on sale from "Surcouf". And I was mightily happy with it.

That tablet stayed with me for 14 years, but its utility was massively reduced by Wacom "Steving" it when Apple moved to MacOS X. In fact, it was turned into a tablet-shaped paperweight.

At that time, I, amongst many other Wacom users, contacted Wacom to ask if we could please get an OSX driver. Could we hell. "OSX can't handle the throughput on ADB", they said. "It's Apple's fault", they said. "Buy a USB tablet, ya cheap bastard", they said.

At the same time, I was running Linux/PPC on my Wallstreet, so I contacted Wacom's developer people, asking if I could get the technical details on how the ADB tablets worked, in order to write a linux driver. Dave Fleck, who was (I believe) responsible for writing much of the Classic Mac code for the ADB tablets, got back to me and explained that:

1 - Can't do that, souper-sekrit eye-pee.
2 - You'll never understand how it works
3 - "XXX can't handle the throughput on ADB"
4 - Buy a USB tablet, ya cheap bastard

... and that was that for a good few years. The trusty ultrapad went back into its box, and gathered dust.

Around 2004/2005, I randomly came across a tool for dumping ADB traffic, using 2 classic macs, a butchered serial cable, and some resistors. Can't remember what I was looking for at the time, but there it was. The tool I needed to be able to dump ADB traffic to and from the tablet. I got inspired, and got dumping, 50 packets at a time (if I remember the limits of the tool). A horrible job, and with no way of actually saving the dump, got writer's cramp. The result of this, however, was a pretty good understanding of how the UD-xxxx-A series talked. There was nothing "difficult" about it; it was, in fact, pretty much trivial. So I wrote an OSX 10.2.x driver for it, and it worked. The first work that cme out of this was an image saying "fuck you wacom!". Yes, very mature.

10.3 arrived, and buggered things up. No longer were ADB-equipped Macs supported.

So I wrote, with a fair amount of help, a driver for the iMate ADB converter, which also included a tablet driver (which then got expanded by others to cover not only the ultrapads, but also the Calcomp slates).

It got hammered by, successively, 10.4, 10.5 (which removed ADB entirely), 10.6, etc etc.

The tablet went back into its box.

Bernard contacted me, sometime 2009/2010. He took my code, with my blessing, and turned it into a part of Waxbee. I rejoiced, got a couple of teensys, and hooked up my ultrapad.

Then Bernard started asking questions about the intuos tablets. They were not the same as the ultrapads. They were - wierd. Random. My interest was piqued. So I bought one. Cheap as chips, obviously, as they no longer fucking work with /anything/. I looked at the packet stream, and cried. And put the tablet away again.

Every now and then, I'd pull it out, have a poke, and get discouraged.

Then I had a revelation. I started dumping packets in binary, rather than hex, and started seeing patterns. The veil started to lift, ever so slightly.

At this point, Bernard and I knew what happened when you plugged an intuos tablet in (the identification packet was pretty similar to the ultrapad one), and what happened when a pen came into proximity (a pair of messages, the forst of which was obviously tool-specific, and the second of which, again, looked similar-ish to the UD series). But the rest was gibberish. A constant stream of 3, 6, or 8 bytes at a time, regardless of pen movement, and seemingly random, followed by 2 bytes 0xfe00 as an obvious out-of-proximity marker.

That stream of bytes, however, started to make some sense when dumped in binary. Move the pen up, and one bit was always set. Down, and the same bit was always clear. Left, another bit set. Right, clear. Odd, but more or less consistent sets of 4 bits inbetween. Lean left, another bit. Lean up, another one. Press a button, an 8 byte packet happens. Release it, another one happens. And so on.

It became clear to me that Wacom were somehow crushing down location data into 5 bit chunks, tilt into 4, probably pressure into another 4. the encoding was opaque, but the meaning was clear.

I forced myself to do something with it, mainly by giving my trusty ultrapad, which was now working under Bernard's waxbee code, to a colleague, leaving myself with only the non-working intuos.  I had 2 choices.  Fix it, or live without a tablet.

So, I had this stream of data, and I thought I knew what some of it meant. I got quite excited about that, and contacted Bernard. We ummed and arred over what the 3 byte and 6 byte packets were all about.

What was obvious was this: Per 3 bytes, regardless of whether it was a 3 byte or a 6 byte packet, data looked like this ...
00xx xxxy yyyy pppp hhhh vvvv
... where x, y, p, h and v are related to x delta, y delta, pressure delta, horizontal tilt delta and vertical tilt delta.

But what were the 6 byte packets? And what of the occasional 8-byte packets that weren't triggered by button state changes? Bernard was of the opinion that the 6 byte packets were simply 2x3 byte packets munged together. I thought they were, more likely, delta and error correction, and that the 8-byters were "reset" packets of some sort. Bernard got his o-scope out, and it suddenly became much clearer what was going on.

The intuos marketing material clearly marks the data rate as being /maximum/ 200 samples per second. I'd initially considered this to be rather ambitious - maybe the USB and serial tablets could hit it, but no way the ADB tablets could go that fast. Turns out the Wacom engineers were smart enough, and cared enough, to hit that 200 samples per second, pretty much all the time. Which, in the end, makes sense - the guts of the tablets are identical, it's only the interface and interface firmware that changes between models. By modifying the ADB polling speed in Waxbee, we could get more, or less, 3-byte, 6-byte and 8-byte packets, and, within reason, we were getting absolutely 200 samples per second. If 3 bytes wasn't fast enough, 2 packets were munged together into a 6-byte packet. If that wasn't fast enough, lose the tilt information for a packet, and munge 3 packets together into 8 bytes. Sheer, insane, genius.

The "packaging" of the pen packets sorted out, this only left the content. It was obvious that there was some sort of delta information in there, but testing quickly revealed it wasn't trivial. Neither simple addition nor "static shift and addition" worked. Time to start diving into code.

This is where things got hard, and fast. I didn't have the tools required to even unpack the code resources from the "Classic" mac driver. So I wrote a resource unpacker in C. Which eventually netted me a 300K+ chunk of 68k binary. And, not having the tools to rip that down or even look at the names, I wrote myself a disassembler in scheme. I used scheme because I could run it from the REPL, and change things on the fly. A bit of intelligent function prologue matching got me down to a bunch of disassembled functions, and a big bunch of data. Traps were, with a bit of help from my old MPW discs, decoded, and that enabled me to find the code dealing with talking to ADB tablets, and, from there, the ADB state machine. Which was hideous.

At this point, I was working from the 4.5.5 driver, which deals with *all* mac-connectable Wacom tablets. UD, SD, GD, XD series, ADB, serial and USB connections. And the damn thing is written in C++. It was too complex to even think about breaking down. So I went looking for earlier drivers. Google and wayback failed me, although I now knew what driver I needed. And eventually, I managed to source one from someone selling a boxed adb intuos on the french equivalent of eBay.

Back to work. Rinse and repeat what I'd done with the 4.5.5 driver on 4.1.2, get to the ADB state machine, and, eventually, to the handler for "Talk Register 0" commands (the ADB polling mechanism). And the slow task of picking through the various bits of internal state. Decoding the structure of the class instances, and the class vtables. Tedious, mind-numbing code-sifting.

The structure of the r0 handler is horrible. Cascading branches based on "if this bit set, then this, else this". But I'd managed to get it labelled up based on the bits being tested, so started manually walking through the code with my packet dumps. And here's how it works:

When a tool first comes into proximity, there is a 7 byte packet sent containing tool type and tool serial number. I knew this already; this packet is identified by having it's top nybble set to 100x, where x is the tool index (0 for the first tool, 1 for a second tool in "dual tracking" mode).

When a tool goes out of proximity, there is a 2 byte packet (which may well be part of a "stream" of deltas (i.e. appended after other packets) of the form 0xfe00 | (tool_index << 8). I knew that one as well.

After the "in proximity" packet, we get one or more "tool major" packets. These vary by tool, and contain "raw" sensor data. The high bit is aways set, and the type of the packet is determined by masking various bits in the top byte. Size and content vary by packet type, from 6 to 8 bytes each. At this point, I only had access to standard styli, so I was only seeing "pen major" packets, which encode location, pressure, tilt, orientation and pen button state. agin, I knew this packet, and had it ripped to bits already.

After the major packets, we get delta packets. The encoding is different per tool, but these are either 2 or 3 bytes each. They *all* encode a location delta as described above, the rest varies by tool type. And, for some tools, there's an alternating sequence of delta packet types. The 4d mouse, for example, alternates between "location and rotation" and "location and wheel" deltas.

When some state changes that *isn't* encoded in the delta packets, we get a new "tool major" packet of the relevant type. Button state, for example (and, for obscure reasons, when the 4d mouse wheel goes from +ve to -ve nd vice versa).

The encoded deltas themselves are of the form "sign + magnitude". For each data element (for example, horizontal tilt), the driver keeps a "shift" value, which is initialised to a "magic" number when we hit a tool major packet. The delta is calculated by shifting the magnitude left by "shift" number of bits, and this is then added or subtracted from the accumulated value according to the sign bit. So far, so (relatively) simple. But then comes the clever bit. The shift value is manipulated according to the magnitude field. So, for tilt, if the magnitude is 0b111, we add 2 to the shift value. 0b110, add 1. 0b010 or 0b011, subtract 1. 0b001, subtract 2. 0b000, subtract 3.

It's what I'd describe as "adaptive shift delta encoding". It's not general, as such - the magic numbers need to be both synchronised between hosts, and probably hand-tuned according to scale and expected delta magnitudes per sensor reading, but it's absolutely brilliant.

So, how does this "adaptive shift" stuff work, then?

In a naive tablet protocol, for example that of the ultrapads and earlier, we transmit all the data, all the time. If the stylus has moved from x location 0x1000 to 0x1010, then to 0x1015, we transmit first 0x1000, then 0x1010, then 0x1015. Wacom could have done this for the intuos tablets, were it not for the combination of ADB's low throughput, and the "required" 200 samples/second thing.

Given that losing samples was presumbly not acceptable, Wacom thus needed to trim down the amount of data being sent per sample. The obvious thing to do is to send deltas where possible. A simple delta encoding scheme sends a first packet containing the "full" data, and then a number of smaller difference packets. In the above example, we could imagine 0x1000, 0x10, 0x05, for example. This works extremely well for messages which are coherent, where each message is likely to be very similar to the preceding one.

There is, however, a catch. If the difference is too big to fit in the fixed size delta packet, it becomes necessary to send a full-size "reset" packet, which "fattens up" the stream again. So, in order to be able to deal with a range of differences (for co-ordinates, these can be thought of as velocities), and thus avoid reset packets, the delta packets must be quite large. And Wacom didn't have the luxury of being able to make the deltas large. Timing constraints mean that the total size of the delta packets for *all* the relevant sensor outputs for a tool can be, maximum, 3 bytes.

So, another approach is needed. A commonly used option is to increase or decrease the delta packet size "on the fly" dependent on the data to be encoded, but, again, that pesky fixed packet size kicks in. So that's out.

Absolute accuracy 100% of the time is not needed, as long as the deltas approximate to, and rapidly converge on, the correct value. The most important measure is transducer location, and Wacom are measuring these to 1/2540 inch (or, if you prefer, 1/100 mm). That's pretty damn precise, a little bit of jitter isn't going to hurt that much as long as it converges fast.

So, what wacom have done is change the range of delta measurements on the fly by shifting the delta value to the right, at the cost of reduced accuracy as the range increases (as we lose the lower significance bits). Let's look at that 5-bit location delta. Remember, it's 1 sign bit plus 4 bits of magnitude.

shift vMin vMax error (mm)
----- ---- ---- ----------
0     0    15   0
1     0    31   0.001
2     0    63   0.002
3     0    127  0.004
4     0    255  0.008
5     0    511  0.016
6     0    1023 0.032
...

In other words, with a shift value of 4 (which, by chance, is wacom's "starting" value for location delta shifts), it's possible to accommodate a pen velocity of ~ 25 cm / sec, whilst staying within a tolerance of 8/100ths of a mm of the "real" value. That's pretty good as it stands, and Wacom probably could have got away with leaving it at that, at least for the smaller tablets.

Where the genius comes in, though, is how they've dealt with convergence / divergence.

Effectively, what happens is this:

If the delta magnitude is at the upper end of the scale (15 in the case of location deltas), the driver assumes it has "undershot" the target, and increments the shift. At the expense of absolute accuracy, this allows the next delta to encompass a larger range, approaching the assumed "far away" target value faster. If the actual value *was* 15, the next delta can still address it directly.

At the lower end of the range, (say, 1 or 2), the driver decrements the shift, reducing range and thus increasing accuracy, and so converges on the "correct" value faster.

At the *absolute* low end of the range, the driver assumes it has "overshot", and "downshifts" faster, reducing the shift value by 2 or more, allowing the "real" value to catch up.

So, for any given delta magnitude, we "upshift" with maximum value, "downshift" for low-end values, and downshift faster for zeros. The only tuning needed is to decide where, and by how much, to downshift, and what the initial shift value should be.

Like I said before - it's genius.

Resurrecting Wacom's ADB-connected Intuos Tablets

Bloody hell. This has been the work, on and off, of over a decade. An ongoing thing that I'd prod at from time to time, a "let's have a look at that sodding thing again" project. Proof, if any were needed, that I'm persistent bastard, and that once something gets under my skin, it stays there.

I've no idea what I'm gonna do now. Because it works. I currently have my (originally) adb-connected Y2k-era intuos hooked up, via USB, to my modern Mac running Mavericks.

Enough blathering.

https://github.com/tufty/adb-intuos

I wouldn't have got this far if it wasn't for the work done by Bernard Poulin, both
on the waxbee project, and the help he's given me in figuring out what the
packet stream means.

The technical requirements are (apart from an ADB Intuos) identical to those for
Bernard's waxbee converter, even down to the pins used for the ADB connection on
the Teensy. There's reasons for that, most of which boil down to me "borrowing"
Bernard's ADB codec code. In any case, if you already have a Teensy wired up for
an Intuos in the vague hope someone would eventually make the damned thing work,
you're good to go. If you haven't, get one and wire it up.

What does this do?

It converts an ADB intuos tablet into the equivalent sized USB-equipped Intuos 2,
thus enabling the use of stock Wacom drivers (up to v6.20, the last version handling
Intuos 2 tablets; this may preclude being able to use the tablet on Windows Cool
on your computer.

I like the sound of that. What does it handle?

I *believe* it handles all ADB intuos tablets without any need for configuration.
It certainly handles the features of my GD-0608-A, including dual tracking, button
bar, standard pen and 4D mouse.

Sounds ace. What's the catch?

It *certainly* doesn't handle any other tools. That's because I don't have any of
them, so I can't work out how they work. Yes, that /is/ a hint. Please contact me
if you have other Intuos 1 tools you can either lend me (I'm in France) or can
hook up to an ADB intuos and dump the output.

How do I make it work?

Get yourself a Teensy 2.0 from pjrc.com
Wire it up : https://code.google.com/p/waxbee/wiki/InterfacingADB
Get yourself an AVR toolset. The rest is based on having a command line toolset and some sort of *n*x. Got a Windows? I have no idea what you do.
Get the code : git clone https://github.com/tufty/adb-intuos.git
Go to the 'converter' subdirectory, and type 'make'
Load the resulting intuos_converter.hex onto your teensy using pjrc's teensy loader
Unplug and replug the teensy, and you're off

Thursday, January 9, 2014

going back to snow leopard from mavericks

So, like a moron, I decided to upgrade my 2008 24" iMac from snow leopard to Mavericks. After all, Apple's updates make shit run faster (or, at least, have done in the past, and I've been doing OSX since the DP). Step one was to go to Mountain Lion. 29 bucks later, I had a mac that ran like shit. So, get myself 4GB of memory, find one chip is dead (the dangers of second user memory). and a 3GB mac that sorta ran acceptably. hrm. Mavericks, then. A fuxking huge download and install, and, again, a mac. that runs like shit. Beachball frenzy. Of course, this is the point at which my time machine disk decides to die. Now, my time machine disk is not formatted as a single volume, but has a bunch of extra stuff on a now inaccessible volume. Crap. time to pull out ddrescue, at least to recover that extra stuff. ddrescue is the number 1 go-to tool for recoveribg data, it's free, and I can't recommend it highly enough. It took 7 passes to do its thing, but I have my data back. Next up is to try and get my external disk to come back to life. More ddrescue, copying /dev/zero to the freshly partitioned disk. 750gb takes a bit of time, good job I'm off work with a sciatica (2 injections ov valium & profenid per day, makes life more or less tolerable). I will, of course, blame spalling pisstakes, grandmatikal erorz and other crap on being drugged out of my tiny box, as well as entering this on an iphone's poxy osk. next step. export contacts and diary entries from Mavericks. then clone the internal disk to the external drive, and disconnect the fucker. Boot 10.5 from the original disks (my 10.6 DVD won't even boot with Mavericks installed) and reformat the internal disk to zero. Install 10.5 and restore the 'original' extras, upgrade to 10.6, then manually restore all my data from the external drive (importing the stuff Mavericks has shat all over). And then. Recompiling my 5 cross-deveopment toolkits. yes, I'm a happy bunny. Not

Monday, July 9, 2012

Atomic setting on ARM

Back to our scheduled programming.  Big gobs of ugly assembly code.

When implementing a multitasking operating system, there is a need for dealing with mutating shared state.  It's one of the unpleasant realities of the world of multitasking (and, indeed, multithreading), and it's fraught with danger.

One of the basic functions we need to implement is the "atomic set" operation - setting a variable to a given value.

Let's take the example of incrementing a variable.  Naively, we might do this:

    ldr r0, [r1]
    add r0, r0, #1
    str r0, [r1]


In most cases, this will work.  However, there exists a case where, in fact, this can fail - if two threads of execution are trying to increment the value, and a task swap happens whilst one is actually doing the increment, the possibility is that the value will be (incorrectly) incremented only once, as follows:

; Assume r1 points at a given address, holding the value 0
[thread 1]
    ldr r0, [r1]  ; r0 in thread 1 is now 0

[thread 2]
    ldr r0, [r1] ; r0 in thread 2 is now 0
    add r0, r0, #1 ; r0 in thread 2 is now 1
    str r0, [r1] ; memory is now 1
...

[thread 1]
    add r0, r0, #1 ; r0 in thread 1 is now 1
    str r0, [r1] ; memory is now 1


Obviously, we would expect memory to be set to 2, not to 1.  So somehow we need to either stop the interrupts happening (easy enough, turn interrupts off, but that has fairly big impacts elsewhere) or somehow deal with the case where we are interrupted mid-operation.

As luck would have it, ARMv6 provides us with 3 handy opcodes for this : ldrex, strex and clrex.  Basically, we use ldrex to signal that we want to have exclusive write access to a memory location, strex signals that we're going to write to a location and close that exclusive access, with a test for success, and clrex says "hey, we're no longer interested".  So, how do we use these to do what we want?

Let's go back to our example above - incrementing a value in memory.  Using ldrex / strex it would look like this:

try_increment:
    ldrex r0, [r1]
    add r0, r0, #1
    strex r2, r0, r1
    cmp r2, #0
    bne try_increment

What happens here is:


  • the initial ldrex loads the memory, and indicates that it wants an exclusive access to the memory itself.
  • We then increment our value, as usual.
  • We write the value back using strex - this will only succeed if:

  1. we still have an exclusive lock on the memory
  2. no newer writes to that memory have happened since we established our exclusive lock
  • success of strex is indicated by register r2 (the "extra" operand that strex uses) being set to 0.
  • If strex has failed, we go back and try again from the point where we loaded the initial value.

For our super-simple increment case, this will probably catch 99.99% of cases.  We add a "belt-and-braces" approach, however, by making our task scheduler explicitly invalidate all exclusive reads, using the clrex opcode.  This has the possibility of making any in-process ldrex-strex blocks restart (and thus take more time), but covers all the bases.

Now, that's all fine and well, but we may want to use this method in our C code, without resorting to subroutine calls (by their very nature, exclusive operations happen at a very low level, and probably want to be inlined).  So we're going to want to use some of that nastiest of nasties, inline assembler in gcc.  Believe me, it's vile.

Here's an implementation of an inline atomic increment using gcc inline assembler:

inline uint32_t atomic_increment(uint32_t * memory) {
  uint32_t temp1, temp2;
  __asm__ __volatile__ (
    "1:\n"
    "\tldrex\t%[t1],[%[m]]\n"
    "\tadd\t%[t1],%[t1],#1\n"
    "\tstrex\t%[t2],%[t1],[%[m]]\n"
    "\tcmp\t%[t2],#0\n"
    "\tbne\t1b"
    : [t1] "=&r" (temp1), [t2] "=&r" (temp2)
    : [m] "r" (memory)
  );
  return temp1;
}


Horrible, no?  Note the use of local labels ('1:' and then branching to '1b' to indicate the latest local label called '1'), having to use encoded tabs and newlines to stop the assembler itself barfing, and the horrible workaround of multiple names for the same variables because gcc is, quite simply, broken.  Still, it works, and the C optimiser can deal with it.


If you want to get more complex, I'd suggest looking at the ARM site for the example implementations of mutexes and condition variables using ldrex/strex.  You'll have to deal with converting from ARM assembler to GNU, but as long as you don't try inlining them, you should be fine.

Followers