Saturday, July 5, 2014

Osprey backpacks



Well, here's a thing.  More of a rant, actually.

As some of you may know, I live in the French Alps, and work as a chairlift driver.  When I'm not working, coding, or playing ice hockey, I can often be found trekking up and down mountains.  Backpacks, and particularly technical backpacks, are a pretty large part of my life.  I've got a good few of them, and use them almost daily.

So, anyway, my old 30L pack has died.  My last trip involved 3 days with the damned thing held together with bits of string.  Not fun, especially as it was mostly raining, so my stuff was getting wet.  Time to replace.  List of requirements:


  • 30L ±
  • Ventilated back (gets hot around here in the summer)
  • Integrated rain cover
  • Hydration system


Now, I'm a fan of Osprey packs, and have been since I bought a "Variant" 52 litre pack.  Well designed, comfortable, and built to last.  So the first place I started looking was Osprey.  I ended up looking seriously at Osprey's "Manta 28" pack, which had pretty much everything I wanted,  a few things I hadn't thought of, and one thing I didn't really want but could live with (it's a framed pack).  It's sold as a "hydraulics" hydration pack.  All the reviews mention the included 3 litre hydration sleeve as being one of the best on the market.  The goddamned Osprey website mentions its "Internal Hydration Sleeve".  So, when I ordered one (not direct from Osprey, but through an Amazon seller, tight-ass that i am) for 85 euros, I was expecting to find a hydration sleeve in the pack or its packaging.

I was disappointed.

"Hrm", I thought.  "The seller must have forgotten it.  Or maybe 'forgotten' it".  So I contacted them.  

"Osprey don't include the hydration sleeve in Europe.  We can sell you one for 35 euros plus shipping"

What the fuck?  Osprey's official Eurozone price of 100 euros is very close to the US-market price.  Nothing on the Osprey europe site indicates that the hydration sleeve isn't included.  "Maybe I've misread", I thought, and went off to compare between osprey-europe.com and ospreypacks.com.  Problem - ospreypacks.com redirects to osprey-europe.com immediately.  In order to hit ospreypacks.com without a redirect, I eventually had to use a US-based proxy.

Keep that in mind.  A european client cannot hit ospreypacks.com without taking specific and non-obvious technical steps to do so.  Which is a shame, because on the ospreypacks.com page for the Manta 28, we find the following, in bold italic text :

Reservoir sold separately in European and Asian markets

Yep.  The one thing a customer from Europe might want to know, before ordering, about this pack, is not available to them.  A bit more "spot the difference" gives us the following 2 images from ospreypacks.com on the left, and from ospreyeurope.com on the right.  Can you see the difference? 


That's right. The ospreyeurope.com version has had the visible bit of hydration tube photoshopped out. Well, nearly all of it, anyway - there's still a bit visible on the right shoulder strap. Beyond that, and the fact that the euro site doesn't explicitly say "Hydration system" as opposed to "internal hydration sleeve", which latter apparently means the place in the pack where you'd put an - ahem - hydration sleeve, there's bugger all difference.

So I've got a new pack (and, if I might say so, a bloody good one), but I'm feeling ripped off.  Because my "Hydraulics" "hydration pack" has a 75cl bottle of water in each side pocket, rather than the 3l hydration pack I was supposed to have.

Adding in the supposedly supplied 3l sleeve is going to cost me half what I paid for the pack itself, and bring the price up to twice that of a more or less equivalent pack with a separately bought sleeve.

So FUCK YOU, Osprey.  You've lost a client.   Yeah, I know.  First world problems.  Call a whaaaambulance.  But fuck you anyway.

Saturday, April 19, 2014

What the *fuck* are Microsoft smoking?

So, seeing as I'm blogging again, here's a bit of a rant.

Boy #1, who is now 14, decided that his computer, which had been happily running some form of Ubuntu, wasn't up to scratch.  Specifically, that he couldn't get in on the F2P revolution - his friends had moved on from Minecraft to League of Legends.  As such, he *needed* Windows.

"Okay", say I.  I have a valid license for Win7 Pro that I don't use (came with my new laptop, which is *ahem* hackintoshed *ahem*), you can use that.  But beyond doing the install and owning the administrator password, as they say around here, "demerdez-vous".  I deal with Macs and unix boxes, I know nothing much about Windows apart from "when I have to use it, I don't like it".  In short, I'm willing to set him up a machine, and occasionally bung in the administrator password, but it it gets pwnt, or stops working, or generally starts buggering about, the response is going to be "Reinstall.  From scratch."

Now, as he's 14, computer time is restricted.  With Ubuntu, I'd installed timekpr, which is an impressively flexible tool for managing computer time usage.

Windows 7 comes with "parental controls", which purport to do something similar.  Well, on the surface, it looks like they do, anyway.  Once you've scratched the surface, it becomes patently clear that Microsoft's parental controls are seriously lacking.

Time usage controls come down to "user x can use the computer between hour y and hour z".

That's it.  Nothing more.  Want to allow your kid to use the computer between 10am and midday, and from 6 to 8 pm, with a maximum screen time of 2 hours?  No, can't do that.  Even better is Microsoft's response to its customers.  It's "brillant", for those who remember The Daily WTF :

Only administrators can set the parental control and not the standard user. So if you wnat to set the time frame of 3 hours per day usage, you will have to ask the user first as to when they want to use the computer and set it under parental controls.

http://answers.microsoft.com/en-us/windows/forum/windows_vista-security/parental-control-time-limit-question/98543dfa-783b-4ae8-b464-c274bce895ec

This shows a(nother) complete misunderstanding of what people might actually want to do with their computers.

So, anyway, parental controls are out.  They suck, and MS don't seem to understand why, or care enough about their clients to try and improve them.  Which leaves 3rd party solutions.

The only one I've found which doesn't suck (indeed, it's better than timekpr, which I considered to be pretty damned good) is timeboss, from http://nicekit.com

Time limits, times when you can't use the computer at all, forced breaks, it's got everything you need to properly restrict usage.

The *only* downside to it is that, even allowing for the massive flexibility it provides, the UI is over-confusing and cluttered.

It's an enormously powerful piece of software, reasonably priced for the peace of mind it provides (although it should be said that no piece of software can replace proper parental oversight); Microsoft would do well to ask themselves why companies like nicekit are making money providing something that should be standard.

Still, if you're a parent who, like most of the world, is tied to Windows, it's definitely software to have.

Friday, April 18, 2014

Intuos tablet resurrection - the story, and the gory bits.

So, I'm going to tell you a story. Are you sitting comfortably? Then I'll begin...

A long time ago, in a land far, far away (well, in the year 1999 or so, and on holiday in Paris), I bought myself a Wacom tablet. An ADB Wacom tablet to go with the Mac IIfx I was using at the time. Or it might have been the G3 desktop, thinking about it. Whatever, it was a ripsnortin' fast Mac with several screens attached and an ADB connector. The tablet was, as I recall, a UD-0608-A, and I bought it on sale from "Surcouf". And I was mightily happy with it.

That tablet stayed with me for 14 years, but its utility was massively reduced by Wacom "Steving" it when Apple moved to MacOS X. In fact, it was turned into a tablet-shaped paperweight.

At that time, I, amongst many other Wacom users, contacted Wacom to ask if we could please get an OSX driver. Could we hell. "OSX can't handle the throughput on ADB", they said. "It's Apple's fault", they said. "Buy a USB tablet, ya cheap bastard", they said.

At the same time, I was running Linux/PPC on my Wallstreet, so I contacted Wacom's developer people, asking if I could get the technical details on how the ADB tablets worked, in order to write a linux driver. Dave Fleck, who was (I believe) responsible for writing much of the Classic Mac code for the ADB tablets, got back to me and explained that:

1 - Can't do that, souper-sekrit eye-pee.
2 - You'll never understand how it works
3 - "XXX can't handle the throughput on ADB"
4 - Buy a USB tablet, ya cheap bastard

... and that was that for a good few years. The trusty ultrapad went back into its box, and gathered dust.

Around 2004/2005, I randomly came across a tool for dumping ADB traffic, using 2 classic macs, a butchered serial cable, and some resistors. Can't remember what I was looking for at the time, but there it was. The tool I needed to be able to dump ADB traffic to and from the tablet. I got inspired, and got dumping, 50 packets at a time (if I remember the limits of the tool). A horrible job, and with no way of actually saving the dump, got writer's cramp. The result of this, however, was a pretty good understanding of how the UD-xxxx-A series talked. There was nothing "difficult" about it; it was, in fact, pretty much trivial. So I wrote an OSX 10.2.x driver for it, and it worked. The first work that cme out of this was an image saying "fuck you wacom!". Yes, very mature.

10.3 arrived, and buggered things up. No longer were ADB-equipped Macs supported.

So I wrote, with a fair amount of help, a driver for the iMate ADB converter, which also included a tablet driver (which then got expanded by others to cover not only the ultrapads, but also the Calcomp slates).

It got hammered by, successively, 10.4, 10.5 (which removed ADB entirely), 10.6, etc etc.

The tablet went back into its box.

Bernard contacted me, sometime 2009/2010. He took my code, with my blessing, and turned it into a part of Waxbee. I rejoiced, got a couple of teensys, and hooked up my ultrapad.

Then Bernard started asking questions about the intuos tablets. They were not the same as the ultrapads. They were - wierd. Random. My interest was piqued. So I bought one. Cheap as chips, obviously, as they no longer fucking work with /anything/. I looked at the packet stream, and cried. And put the tablet away again.

Every now and then, I'd pull it out, have a poke, and get discouraged.

Then I had a revelation. I started dumping packets in binary, rather than hex, and started seeing patterns. The veil started to lift, ever so slightly.

At this point, Bernard and I knew what happened when you plugged an intuos tablet in (the identification packet was pretty similar to the ultrapad one), and what happened when a pen came into proximity (a pair of messages, the forst of which was obviously tool-specific, and the second of which, again, looked similar-ish to the UD series). But the rest was gibberish. A constant stream of 3, 6, or 8 bytes at a time, regardless of pen movement, and seemingly random, followed by 2 bytes 0xfe00 as an obvious out-of-proximity marker.

That stream of bytes, however, started to make some sense when dumped in binary. Move the pen up, and one bit was always set. Down, and the same bit was always clear. Left, another bit set. Right, clear. Odd, but more or less consistent sets of 4 bits inbetween. Lean left, another bit. Lean up, another one. Press a button, an 8 byte packet happens. Release it, another one happens. And so on.

It became clear to me that Wacom were somehow crushing down location data into 5 bit chunks, tilt into 4, probably pressure into another 4. the encoding was opaque, but the meaning was clear.

I forced myself to do something with it, mainly by giving my trusty ultrapad, which was now working under Bernard's waxbee code, to a colleague, leaving myself with only the non-working intuos.  I had 2 choices.  Fix it, or live without a tablet.

So, I had this stream of data, and I thought I knew what some of it meant. I got quite excited about that, and contacted Bernard. We ummed and arred over what the 3 byte and 6 byte packets were all about.

What was obvious was this: Per 3 bytes, regardless of whether it was a 3 byte or a 6 byte packet, data looked like this ...
00xx xxxy yyyy pppp hhhh vvvv
... where x, y, p, h and v are related to x delta, y delta, pressure delta, horizontal tilt delta and vertical tilt delta.

But what were the 6 byte packets? And what of the occasional 8-byte packets that weren't triggered by button state changes? Bernard was of the opinion that the 6 byte packets were simply 2x3 byte packets munged together. I thought they were, more likely, delta and error correction, and that the 8-byters were "reset" packets of some sort. Bernard got his o-scope out, and it suddenly became much clearer what was going on.

The intuos marketing material clearly marks the data rate as being /maximum/ 200 samples per second. I'd initially considered this to be rather ambitious - maybe the USB and serial tablets could hit it, but no way the ADB tablets could go that fast. Turns out the Wacom engineers were smart enough, and cared enough, to hit that 200 samples per second, pretty much all the time. Which, in the end, makes sense - the guts of the tablets are identical, it's only the interface and interface firmware that changes between models. By modifying the ADB polling speed in Waxbee, we could get more, or less, 3-byte, 6-byte and 8-byte packets, and, within reason, we were getting absolutely 200 samples per second. If 3 bytes wasn't fast enough, 2 packets were munged together into a 6-byte packet. If that wasn't fast enough, lose the tilt information for a packet, and munge 3 packets together into 8 bytes. Sheer, insane, genius.

The "packaging" of the pen packets sorted out, this only left the content. It was obvious that there was some sort of delta information in there, but testing quickly revealed it wasn't trivial. Neither simple addition nor "static shift and addition" worked. Time to start diving into code.

This is where things got hard, and fast. I didn't have the tools required to even unpack the code resources from the "Classic" mac driver. So I wrote a resource unpacker in C. Which eventually netted me a 300K+ chunk of 68k binary. And, not having the tools to rip that down or even look at the names, I wrote myself a disassembler in scheme. I used scheme because I could run it from the REPL, and change things on the fly. A bit of intelligent function prologue matching got me down to a bunch of disassembled functions, and a big bunch of data. Traps were, with a bit of help from my old MPW discs, decoded, and that enabled me to find the code dealing with talking to ADB tablets, and, from there, the ADB state machine. Which was hideous.

At this point, I was working from the 4.5.5 driver, which deals with *all* mac-connectable Wacom tablets. UD, SD, GD, XD series, ADB, serial and USB connections. And the damn thing is written in C++. It was too complex to even think about breaking down. So I went looking for earlier drivers. Google and wayback failed me, although I now knew what driver I needed. And eventually, I managed to source one from someone selling a boxed adb intuos on the french equivalent of eBay.

Back to work. Rinse and repeat what I'd done with the 4.5.5 driver on 4.1.2, get to the ADB state machine, and, eventually, to the handler for "Talk Register 0" commands (the ADB polling mechanism). And the slow task of picking through the various bits of internal state. Decoding the structure of the class instances, and the class vtables. Tedious, mind-numbing code-sifting.

The structure of the r0 handler is horrible. Cascading branches based on "if this bit set, then this, else this". But I'd managed to get it labelled up based on the bits being tested, so started manually walking through the code with my packet dumps. And here's how it works:

When a tool first comes into proximity, there is a 7 byte packet sent containing tool type and tool serial number. I knew this already; this packet is identified by having it's top nybble set to 100x, where x is the tool index (0 for the first tool, 1 for a second tool in "dual tracking" mode).

When a tool goes out of proximity, there is a 2 byte packet (which may well be part of a "stream" of deltas (i.e. appended after other packets) of the form 0xfe00 | (tool_index << 8). I knew that one as well.

After the "in proximity" packet, we get one or more "tool major" packets. These vary by tool, and contain "raw" sensor data. The high bit is aways set, and the type of the packet is determined by masking various bits in the top byte. Size and content vary by packet type, from 6 to 8 bytes each. At this point, I only had access to standard styli, so I was only seeing "pen major" packets, which encode location, pressure, tilt, orientation and pen button state. agin, I knew this packet, and had it ripped to bits already.

After the major packets, we get delta packets. The encoding is different per tool, but these are either 2 or 3 bytes each. They *all* encode a location delta as described above, the rest varies by tool type. And, for some tools, there's an alternating sequence of delta packet types. The 4d mouse, for example, alternates between "location and rotation" and "location and wheel" deltas.

When some state changes that *isn't* encoded in the delta packets, we get a new "tool major" packet of the relevant type. Button state, for example (and, for obscure reasons, when the 4d mouse wheel goes from +ve to -ve nd vice versa).

The encoded deltas themselves are of the form "sign + magnitude". For each data element (for example, horizontal tilt), the driver keeps a "shift" value, which is initialised to a "magic" number when we hit a tool major packet. The delta is calculated by shifting the magnitude left by "shift" number of bits, and this is then added or subtracted from the accumulated value according to the sign bit. So far, so (relatively) simple. But then comes the clever bit. The shift value is manipulated according to the magnitude field. So, for tilt, if the magnitude is 0b111, we add 2 to the shift value. 0b110, add 1. 0b010 or 0b011, subtract 1. 0b001, subtract 2. 0b000, subtract 3.

It's what I'd describe as "adaptive shift delta encoding". It's not general, as such - the magic numbers need to be both synchronised between hosts, and probably hand-tuned according to scale and expected delta magnitudes per sensor reading, but it's absolutely brilliant.

So, how does this "adaptive shift" stuff work, then?

In a naive tablet protocol, for example that of the ultrapads and earlier, we transmit all the data, all the time. If the stylus has moved from x location 0x1000 to 0x1010, then to 0x1015, we transmit first 0x1000, then 0x1010, then 0x1015. Wacom could have done this for the intuos tablets, were it not for the combination of ADB's low throughput, and the "required" 200 samples/second thing.

Given that losing samples was presumbly not acceptable, Wacom thus needed to trim down the amount of data being sent per sample. The obvious thing to do is to send deltas where possible. A simple delta encoding scheme sends a first packet containing the "full" data, and then a number of smaller difference packets. In the above example, we could imagine 0x1000, 0x10, 0x05, for example. This works extremely well for messages which are coherent, where each message is likely to be very similar to the preceding one.

There is, however, a catch. If the difference is too big to fit in the fixed size delta packet, it becomes necessary to send a full-size "reset" packet, which "fattens up" the stream again. So, in order to be able to deal with a range of differences (for co-ordinates, these can be thought of as velocities), and thus avoid reset packets, the delta packets must be quite large. And Wacom didn't have the luxury of being able to make the deltas large. Timing constraints mean that the total size of the delta packets for *all* the relevant sensor outputs for a tool can be, maximum, 3 bytes.

So, another approach is needed. A commonly used option is to increase or decrease the delta packet size "on the fly" dependent on the data to be encoded, but, again, that pesky fixed packet size kicks in. So that's out.

Absolute accuracy 100% of the time is not needed, as long as the deltas approximate to, and rapidly converge on, the correct value. The most important measure is transducer location, and Wacom are measuring these to 1/2540 inch (or, if you prefer, 1/100 mm). That's pretty damn precise, a little bit of jitter isn't going to hurt that much as long as it converges fast.

So, what wacom have done is change the range of delta measurements on the fly by shifting the delta value to the right, at the cost of reduced accuracy as the range increases (as we lose the lower significance bits). Let's look at that 5-bit location delta. Remember, it's 1 sign bit plus 4 bits of magnitude.

shift vMin vMax error (mm)
----- ---- ---- ----------
0     0    15   0
1     0    31   0.001
2     0    63   0.002
3     0    127  0.004
4     0    255  0.008
5     0    511  0.016
6     0    1023 0.032
...

In other words, with a shift value of 4 (which, by chance, is wacom's "starting" value for location delta shifts), it's possible to accommodate a pen velocity of ~ 25 cm / sec, whilst staying within a tolerance of 8/100ths of a mm of the "real" value. That's pretty good as it stands, and Wacom probably could have got away with leaving it at that, at least for the smaller tablets.

Where the genius comes in, though, is how they've dealt with convergence / divergence.

Effectively, what happens is this:

If the delta magnitude is at the upper end of the scale (15 in the case of location deltas), the driver assumes it has "undershot" the target, and increments the shift. At the expense of absolute accuracy, this allows the next delta to encompass a larger range, approaching the assumed "far away" target value faster. If the actual value *was* 15, the next delta can still address it directly.

At the lower end of the range, (say, 1 or 2), the driver decrements the shift, reducing range and thus increasing accuracy, and so converges on the "correct" value faster.

At the *absolute* low end of the range, the driver assumes it has "overshot", and "downshifts" faster, reducing the shift value by 2 or more, allowing the "real" value to catch up.

So, for any given delta magnitude, we "upshift" with maximum value, "downshift" for low-end values, and downshift faster for zeros. The only tuning needed is to decide where, and by how much, to downshift, and what the initial shift value should be.

Like I said before - it's genius.

Resurrecting Wacom's ADB-connected Intuos Tablets

Bloody hell. This has been the work, on and off, of over a decade. An ongoing thing that I'd prod at from time to time, a "let's have a look at that sodding thing again" project. Proof, if any were needed, that I'm persistent bastard, and that once something gets under my skin, it stays there.

I've no idea what I'm gonna do now. Because it works. I currently have my (originally) adb-connected Y2k-era intuos hooked up, via USB, to my modern Mac running Mavericks.

Enough blathering.

https://github.com/tufty/adb-intuos

I wouldn't have got this far if it wasn't for the work done by Bernard Poulin, both
on the waxbee project, and the help he's given me in figuring out what the
packet stream means.

The technical requirements are (apart from an ADB Intuos) identical to those for
Bernard's waxbee converter, even down to the pins used for the ADB connection on
the Teensy. There's reasons for that, most of which boil down to me "borrowing"
Bernard's ADB codec code. In any case, if you already have a Teensy wired up for
an Intuos in the vague hope someone would eventually make the damned thing work,
you're good to go. If you haven't, get one and wire it up.

What does this do?

It converts an ADB intuos tablet into the equivalent sized USB-equipped Intuos 2,
thus enabling the use of stock Wacom drivers (up to v6.20, the last version handling
Intuos 2 tablets; this may preclude being able to use the tablet on Windows Cool
on your computer.

I like the sound of that. What does it handle?

I *believe* it handles all ADB intuos tablets without any need for configuration.
It certainly handles the features of my GD-0608-A, including dual tracking, button
bar, standard pen and 4D mouse.

Sounds ace. What's the catch?

It *certainly* doesn't handle any other tools. That's because I don't have any of
them, so I can't work out how they work. Yes, that /is/ a hint. Please contact me
if you have other Intuos 1 tools you can either lend me (I'm in France) or can
hook up to an ADB intuos and dump the output.

How do I make it work?

Get yourself a Teensy 2.0 from pjrc.com
Wire it up : https://code.google.com/p/waxbee/wiki/InterfacingADB
Get yourself an AVR toolset. The rest is based on having a command line toolset and some sort of *n*x. Got a Windows? I have no idea what you do.
Get the code : git clone https://github.com/tufty/adb-intuos.git
Go to the 'converter' subdirectory, and type 'make'
Load the resulting intuos_converter.hex onto your teensy using pjrc's teensy loader
Unplug and replug the teensy, and you're off

Thursday, January 9, 2014

going back to snow leopard from mavericks

So, like a moron, I decided to upgrade my 2008 24" iMac from snow leopard to Mavericks. After all, Apple's updates make shit run faster (or, at least, have done in the past, and I've been doing OSX since the DP). Step one was to go to Mountain Lion. 29 bucks later, I had a mac that ran like shit. So, get myself 4GB of memory, find one chip is dead (the dangers of second user memory). and a 3GB mac that sorta ran acceptably. hrm. Mavericks, then. A fuxking huge download and install, and, again, a mac. that runs like shit. Beachball frenzy. Of course, this is the point at which my time machine disk decides to die. Now, my time machine disk is not formatted as a single volume, but has a bunch of extra stuff on a now inaccessible volume. Crap. time to pull out ddrescue, at least to recover that extra stuff. ddrescue is the number 1 go-to tool for recoveribg data, it's free, and I can't recommend it highly enough. It took 7 passes to do its thing, but I have my data back. Next up is to try and get my external disk to come back to life. More ddrescue, copying /dev/zero to the freshly partitioned disk. 750gb takes a bit of time, good job I'm off work with a sciatica (2 injections ov valium & profenid per day, makes life more or less tolerable). I will, of course, blame spalling pisstakes, grandmatikal erorz and other crap on being drugged out of my tiny box, as well as entering this on an iphone's poxy osk. next step. export contacts and diary entries from Mavericks. then clone the internal disk to the external drive, and disconnect the fucker. Boot 10.5 from the original disks (my 10.6 DVD won't even boot with Mavericks installed) and reformat the internal disk to zero. Install 10.5 and restore the 'original' extras, upgrade to 10.6, then manually restore all my data from the external drive (importing the stuff Mavericks has shat all over). And then. Recompiling my 5 cross-deveopment toolkits. yes, I'm a happy bunny. Not

Monday, July 9, 2012

Atomic setting on ARM

Back to our scheduled programming.  Big gobs of ugly assembly code.

When implementing a multitasking operating system, there is a need for dealing with mutating shared state.  It's one of the unpleasant realities of the world of multitasking (and, indeed, multithreading), and it's fraught with danger.

One of the basic functions we need to implement is the "atomic set" operation - setting a variable to a given value.

Let's take the example of incrementing a variable.  Naively, we might do this:

    ldr r0, [r1]
    add r0, r0, #1
    str r0, [r1]


In most cases, this will work.  However, there exists a case where, in fact, this can fail - if two threads of execution are trying to increment the value, and a task swap happens whilst one is actually doing the increment, the possibility is that the value will be (incorrectly) incremented only once, as follows:

; Assume r1 points at a given address, holding the value 0
[thread 1]
    ldr r0, [r1]  ; r0 in thread 1 is now 0

[thread 2]
    ldr r0, [r1] ; r0 in thread 2 is now 0
    add r0, r0, #1 ; r0 in thread 2 is now 1
    str r0, [r1] ; memory is now 1
...

[thread 1]
    add r0, r0, #1 ; r0 in thread 1 is now 1
    str r0, [r1] ; memory is now 1


Obviously, we would expect memory to be set to 2, not to 1.  So somehow we need to either stop the interrupts happening (easy enough, turn interrupts off, but that has fairly big impacts elsewhere) or somehow deal with the case where we are interrupted mid-operation.

As luck would have it, ARMv6 provides us with 3 handy opcodes for this : ldrex, strex and clrex.  Basically, we use ldrex to signal that we want to have exclusive write access to a memory location, strex signals that we're going to write to a location and close that exclusive access, with a test for success, and clrex says "hey, we're no longer interested".  So, how do we use these to do what we want?

Let's go back to our example above - incrementing a value in memory.  Using ldrex / strex it would look like this:

try_increment:
    ldrex r0, [r1]
    add r0, r0, #1
    strex r2, r0, r1
    cmp r2, #0
    bne try_increment

What happens here is:


  • the initial ldrex loads the memory, and indicates that it wants an exclusive access to the memory itself.
  • We then increment our value, as usual.
  • We write the value back using strex - this will only succeed if:

  1. we still have an exclusive lock on the memory
  2. no newer writes to that memory have happened since we established our exclusive lock
  • success of strex is indicated by register r2 (the "extra" operand that strex uses) being set to 0.
  • If strex has failed, we go back and try again from the point where we loaded the initial value.

For our super-simple increment case, this will probably catch 99.99% of cases.  We add a "belt-and-braces" approach, however, by making our task scheduler explicitly invalidate all exclusive reads, using the clrex opcode.  This has the possibility of making any in-process ldrex-strex blocks restart (and thus take more time), but covers all the bases.

Now, that's all fine and well, but we may want to use this method in our C code, without resorting to subroutine calls (by their very nature, exclusive operations happen at a very low level, and probably want to be inlined).  So we're going to want to use some of that nastiest of nasties, inline assembler in gcc.  Believe me, it's vile.

Here's an implementation of an inline atomic increment using gcc inline assembler:

inline uint32_t atomic_increment(uint32_t * memory) {
  uint32_t temp1, temp2;
  __asm__ __volatile__ (
    "1:\n"
    "\tldrex\t%[t1],[%[m]]\n"
    "\tadd\t%[t1],%[t1],#1\n"
    "\tstrex\t%[t2],%[t1],[%[m]]\n"
    "\tcmp\t%[t2],#0\n"
    "\tbne\t1b"
    : [t1] "=&r" (temp1), [t2] "=&r" (temp2)
    : [m] "r" (memory)
  );
  return temp1;
}


Horrible, no?  Note the use of local labels ('1:' and then branching to '1b' to indicate the latest local label called '1'), having to use encoded tabs and newlines to stop the assembler itself barfing, and the horrible workaround of multiple names for the same variables because gcc is, quite simply, broken.  Still, it works, and the C optimiser can deal with it.


If you want to get more complex, I'd suggest looking at the ARM site for the example implementations of mutexes and condition variables using ldrex/strex.  You'll have to deal with converting from ARM assembler to GNU, but as long as you don't try inlining them, you should be fine.

Wednesday, May 9, 2012

An aside. Vector maths for games


Most people, when they start writing games, run up against having to rotate and move objects.  Generally speaking, they attack this problem "head on" using using trigonometry, specifically using sines, cosines and arctangents.  This is a direct application of the SOH/CAH/TOA trigonometry you learned (or are in the process of learning, or will be learning later on) at secondary school.

Although simple and easy to understand, this approach has a few drawbacks.


  • The first (and usually most noticeable) of these is that it's quite slow.  It's manageable for a small amount of objects, but if, for example, you're trying to do something like "Geometry Wars" with thousands of objects rotating and moving at the same time, you're soon going to find that all the trig rapidly becomes a bit of a bottleneck.  It gets even worse if you move into 3 dimensions.
  • The second is related to the first, and, in these days where GPU acceleration and multi-core processing are commonplace, can be a major sticking point - Trigonometry can be hard to efficiently parallelise.
  • The third, and perhaps most important despite being the least obvious, is accuracy.  Floating point calculations are never exact, and the errors in a purely trigonometric approach are usually cumulative.  Generally speaking, this means that if you rotate an object by 1° clockwise, 360 times, it will not end up where it started.


Luckily, there exists a way to almost totally bypass the trigonometry.  Let's recap what state we usually end up using for an object.


  • We keep track of its location in space.  We typically do this using a pair of co-ordinates.
  • We keep track of the direction it's facing.  Typically using an angle.
  • We keep track of the direction in which it is moving, again using an angle.
  • We keep track of its current speed.


We also know how to convert from cartesian co-ordinates (x and y) to polar coordinates (angle and size) and vice versa, we know that polar co-ordinates are equivalent to cartesian co-ordinates, and we've seen that, for some tasks, it's easier to use polar co-ordinates than cartesian ones.

What we're going to do is get rid of all the references to angles, and all the trigonometry.  We're going to do this by using vector mathematics.

So, we have location.  This is a set of co-ordinates, and thus a vector. So far, so good.

By combining direction of movement and speed into a single conceptual value, we get velocity, which is a set of (polar) co-ordinates, and also a vector.  Unfortunately, it has an angle embedded in it, but as we know that polar and cartesian co-ordinates are equivalent, we can represent it in cartesian form (velocity-x and velocity-y).

Which leaves us with that unfortunate direction we're facing, which is a single value indicating angle, and thus a scalar.  However, the only bit we're interested in is the angle, so we can convert it into a vector with a size of 1 (a unit vector), and then get rid of the 'angle' part by converting it to cartesian form (facing-x and facing-y).

So what does that get us?

Well, let's look at the tasks we have done so far.

Moving an object from its current location to the next location


This is the most basic task we want to do, and it's also the most simple.  It's a trivial vector addition of location and velocity, which is to say :

location-x = location-x + velocity-x
location-y = location-y + velocity-y

Accelerating an object in a particular direction


Again, this is pretty simple.  We represent the acceleration as a vector quantity, and add it to velocity :

velocity-x = velocity-x + acceleration-x
velocity-y = velocity-y + acceleration-y

That's the basics of acceleration, but how do we get the acceleration value?

I'm going to explain a basic model of Newtonian physics.  Newton's Second Law is most commonly stated as *F=ma*, where F is the force applied to an object, m is its mass, and a is the acceleration - it can be restated as *a=F/m*.  So to calculate the acceleration of each object, we need to know the force acting on it, and its mass.

We'll assume the mass of the object is constant, for the moment (no near-light-speed effects here, thank you), so what wee need to calculate is the force acting on the object.  Force is a vector value, and the overall force acting on the object is simply the addition of all the forces acting on it.  So, we'll start each "step" of the game with a clean slate, and add the forces up as we go:

force-x = 0
force-y = 0

... Accumulate forces

acceleration-x = force-x / mass
acceleration-y = force-y / mass

So, let's look at the common forces we might

"Thrusting"


To apply a "thrust" in the direction we're facing, we can use the unit vector *facing*, multiplied by some constant of "thrust strength", as follows (remember, *facing* is a unit vector, so it always has a length of 1) :

force-x += facing-x * thrust-strength
force-y += facing-y * thrust-strength

Acceleration towards the floor due to gravity


Assuming that gravity always works downwards, and that our co-ordinate system starts at the bottom left, we can make gravity work as follows (obviously, the value of gravity-constant should be chosen to work with whatever system of units we're using):

force-y -= gravity-constant

Acceleration towards (or away from) other objects


There's 2 cases here.

The first is a very simplistic "chasing" behaviour, where an object "chases" another one.  This is usually best done by rotating the object to point at its target (see later) and then "thrusting" as above, but it can also be useful to carry out a simple "go that way" approach.

The second is a more complex simulation of gravity - if we were implementing something like Geometry Wars, we might have "black holes" that pull objects towards them.  In this case, gravity is a bit more complex, as it not only works in a different direction for each pair of objects, but it gets stronger depending on how close the objects are to each other.

In both cases, the first thing we do is to create a vector that gives both the direction in which we are going to be pulled, and the distance between the objects:

vector-x = other.location-x - location-x
vector-y = other.location-y - location-y

In both cases, we need to know the distance between the two objects.  For the simple chasing case, we'll calculate it directly:

distance = size(vector)

Now, gravity usually works according to what's known as an inverse square law, which is to say that it's inversely proportional to the square of the distance between the two objects.

Remember, Pythagorus tells us that, for any right angled triangle, the sum of the squares on the short sides are equal to the square on the hypotenuse - the square on the hypotenuse is, in this case, equal to the distance squared, so rather that calculating the distance in a single step, we do it in 2 steps so that we can keep the "distance-squared" value :

distance-squared = (vector.x * vector.x) + (vector.y * vector.y)
distance = sqrt(distance)

Now, in both cases, we want to scale the vector by a factor.  To do this, we need to scale the vector to a unit size, and then multiply it by our factor.  To scale to a unit size, we divide each co-ordinate by "distance" :

vector-x = vector-x / distance
vector-y = vector-y / distance

For the simple chasing case, we now multiply by our "chasing strength", and we have our force :

force-x = vector-x * chase-strength
force-y = vector-y * chase-strength

For the attraction due to gravity case, we divide by the distance squared, and then multiply by some constant to get a usable value :

force-x = (vector-x * gravity-constant) / distance-squared
force-y = (vector-y * gravity-constant) / distance-squared

Braking / Friction


Braking is a special case.  We could implement it by making a force vector that points in the opposite direction to the current velocity, but it's simpler (and faster) to simply multiply the current velocity by a factor.

"Proportional" braking is very simple.  If we wanted, for example, to brake by 5% every "step", we would do this:

braking-factor = 0.95
velocity-x = velocity-x * 0.95
velocity-y = velocity-y * 0.95

This makes a very realistic simulation of friction.

"Fixed amount" braking is a tiny bit more tricky - we need to know the size of the current velocity (using Pythagoras' theorem) and work out the factor from there:

speed = size(velocity)
if speed < braking-amount
braking-factor = 0
else
braking-factor = (speed - braking-amount) / speed
endif
velocity-x = (velocity-x * braking-factor)
velocity-y = (velocity-y * braking-factor)

Rotating


Now, so far, everything we've done is incredibly simple - the most complex thing we've come across is a square root, and we haven't had to bother with angles *at all*.

Rotating objects is a bit more complex.  After all, how can we rotate something to a given angle if we don't know what angle it's already at?

The short answer to this is

 I lied.  It's not complex at all.

but the short answer isn't very interesting (and, above all, doesn't tell us anything useful).  The long answer follows, and is based on matrix multiplication.  As a reminder (or a first taste, if you've never seen this before), a matrix is a (conceptually) rectangular grid of values.

Given that we're dealing with 2 dimensions, one of the most useful matrices we can imagine is a 2 x 2 matrix; we can multiply a vector (x, y) by such a matrix as follows :

| a b | |x| = |(x * a) + (y * b)|
| c d | |y|   |(x * c) + (y * d)|

Now, that's pretty simple to understand, and it's trivially simple to implement in code. So, multiplying a matrix by a vector is going to return a vector with (potentially) different values, depending on what's in the matrix. So far, so good, right?

Now, let's go back to rotating stuff using SOHCAHTOA, keeping that in mind.  When we rotate a point (x, y) by an angle θ, we get a point whose x co-ordinate is given by (x * cos(θ)) - (y * sin(θ)), and whose y co-ordinate is given by (x * sin(θ)) + (y * cos(θ)).  Now that looks a lot like the matrix multiplication above, and indeed it is.  We can encode a rotation by some arbitrary angle θ as a matrix, as follows:

| cos θ -sin θ |
| sin θ  cos θ |

This is, in fact so useful that the majority of graphical toolkits (OpenGL, DirectX and so on) provide an API that uses matrices (actually, an extension of the rotation vector given above, but we'll see that later)

Now that in itself is pretty nifty, but here's where the real magick occurs.  Remember SOHCAHTOA?  Remember what it means?


  • Sin of angle = length of Opposite / length of Hypotenuse
  • Cos of angle = length of Adjacent / length of Hypotenuse
  • Tan of angle = length of Opposite / length of Adjacent


We've already used the TOA bit, when we used atan2 beforehand.  We're about to use the SOH and CAH bits to our profit.

Consider a unit vector. The length of its hypotenuse will always be 1.  And as any number divided by 1 is unchanged, we can see that the sin of the angle of that vector is given by the length of the opposite and the cos of the angle of that vector is given by the length of the adjacent.  And we know those values already - they are, quite simply, the co-ordinates of the vector.

So, rather than talking about angles, we can talk about unit vectors, and do away with all that messy trigonometry.  Well, at least when we're talking about taking something that's aligned at an angle of zero, and rotating it to a particular angle as described by a unit vector v - we use a rotation matrix like this:

| v.x -v.y |
| v.y  v.x |

giving us

new-x = v.x * x - v.y * y
new-y = v.y * x + v.x * y

What does this mean?


  • We can, using our handy unit "facing" vector, draw a graphic of our object, rotated to the correct angle, without having to call any trigonometric functions whatsoever.
  • We can make an object A point at another object B by calculating the vector A->B (trivial, this is given by B.location - A.location) and then scaling it to unit size (again, pretty easy - divide each coordinate by the length of the original vector), and using that as the new "facing" vector for object A.
  • If we need "increment" or "decrement" ability (for example, "rotate by 5° CCW") we can precalculate unit vectors describing the angles we need (obviously, this would need to be done using trigonometric functions, but it would only need to be done once), and then use those vectors to create our rotation matrices.


The only thing that's really left is rotating one object rotate towards another (think turret slowly rotating to face player) in increments.  To do this we need to know whether we are going to rotate clockwise or anticlockwise.  The easiest (although probably not the fastest) way to do this is to use atan2() to get the angles of the current "facing" vector and the vector A->B, to decide which way to rotate, and then use the incremental approach above.

Earlier, I touched on the fact that the various toolkits all provide matrix APIs.  How these work is by combining both a rotation and translation matrix (yes, we can deal with translations too) into one matrix.  For a 2-d case, we would do something like this:

|  cosθ -sinθ  t.x  |   | x |
|  sinθ  cosθ  t.y  | * | y |
|   0     0     1   |   | 1 |

Ignoring the pointless result given by the 1 in the vector, we would get :

x = (x * cosθ) - (y * sinθ) + t.x
y = (x * sinθ) + (y * cosθ) + t.y

This is, simply put, the value we had before, with the x and y values of t added on.  The matrix therefore rotates an object by *θ*, and then translates the result by *t*.

The matrices used by OpenGL and so on are not, in fact, 2D specific.  Everything is generalised to the 3D case, so we use :

|  cosθ -sinθ   0    t.x  |
|  sinθ  cosθ   0    t.y  |
|   0     0     1     0   |
|   0     0     0     1   |

This is actually a matrix for 3D translations and rotations, with all the z axis stuff left out.

Followers