Friday, April 18, 2014

Intuos tablet resurrection - the story, and the gory bits.

So, I'm going to tell you a story. Are you sitting comfortably? Then I'll begin...

A long time ago, in a land far, far away (well, in the year 1999 or so, and on holiday in Paris), I bought myself a Wacom tablet. An ADB Wacom tablet to go with the Mac IIfx I was using at the time. Or it might have been the G3 desktop, thinking about it. Whatever, it was a ripsnortin' fast Mac with several screens attached and an ADB connector. The tablet was, as I recall, a UD-0608-A, and I bought it on sale from "Surcouf". And I was mightily happy with it.

That tablet stayed with me for 14 years, but its utility was massively reduced by Wacom "Steving" it when Apple moved to MacOS X. In fact, it was turned into a tablet-shaped paperweight.

At that time, I, amongst many other Wacom users, contacted Wacom to ask if we could please get an OSX driver. Could we hell. "OSX can't handle the throughput on ADB", they said. "It's Apple's fault", they said. "Buy a USB tablet, ya cheap bastard", they said.

At the same time, I was running Linux/PPC on my Wallstreet, so I contacted Wacom's developer people, asking if I could get the technical details on how the ADB tablets worked, in order to write a linux driver. Dave Fleck, who was (I believe) responsible for writing much of the Classic Mac code for the ADB tablets, got back to me and explained that:

1 - Can't do that, souper-sekrit eye-pee.
2 - You'll never understand how it works
3 - "XXX can't handle the throughput on ADB"
4 - Buy a USB tablet, ya cheap bastard

... and that was that for a good few years. The trusty ultrapad went back into its box, and gathered dust.

Around 2004/2005, I randomly came across a tool for dumping ADB traffic, using 2 classic macs, a butchered serial cable, and some resistors. Can't remember what I was looking for at the time, but there it was. The tool I needed to be able to dump ADB traffic to and from the tablet. I got inspired, and got dumping, 50 packets at a time (if I remember the limits of the tool). A horrible job, and with no way of actually saving the dump, got writer's cramp. The result of this, however, was a pretty good understanding of how the UD-xxxx-A series talked. There was nothing "difficult" about it; it was, in fact, pretty much trivial. So I wrote an OSX 10.2.x driver for it, and it worked. The first work that cme out of this was an image saying "fuck you wacom!". Yes, very mature.

10.3 arrived, and buggered things up. No longer were ADB-equipped Macs supported.

So I wrote, with a fair amount of help, a driver for the iMate ADB converter, which also included a tablet driver (which then got expanded by others to cover not only the ultrapads, but also the Calcomp slates).

It got hammered by, successively, 10.4, 10.5 (which removed ADB entirely), 10.6, etc etc.

The tablet went back into its box.

Bernard contacted me, sometime 2009/2010. He took my code, with my blessing, and turned it into a part of Waxbee. I rejoiced, got a couple of teensys, and hooked up my ultrapad.

Then Bernard started asking questions about the intuos tablets. They were not the same as the ultrapads. They were - wierd. Random. My interest was piqued. So I bought one. Cheap as chips, obviously, as they no longer fucking work with /anything/. I looked at the packet stream, and cried. And put the tablet away again.

Every now and then, I'd pull it out, have a poke, and get discouraged.

Then I had a revelation. I started dumping packets in binary, rather than hex, and started seeing patterns. The veil started to lift, ever so slightly.

At this point, Bernard and I knew what happened when you plugged an intuos tablet in (the identification packet was pretty similar to the ultrapad one), and what happened when a pen came into proximity (a pair of messages, the forst of which was obviously tool-specific, and the second of which, again, looked similar-ish to the UD series). But the rest was gibberish. A constant stream of 3, 6, or 8 bytes at a time, regardless of pen movement, and seemingly random, followed by 2 bytes 0xfe00 as an obvious out-of-proximity marker.

That stream of bytes, however, started to make some sense when dumped in binary. Move the pen up, and one bit was always set. Down, and the same bit was always clear. Left, another bit set. Right, clear. Odd, but more or less consistent sets of 4 bits inbetween. Lean left, another bit. Lean up, another one. Press a button, an 8 byte packet happens. Release it, another one happens. And so on.

It became clear to me that Wacom were somehow crushing down location data into 5 bit chunks, tilt into 4, probably pressure into another 4. the encoding was opaque, but the meaning was clear.

I forced myself to do something with it, mainly by giving my trusty ultrapad, which was now working under Bernard's waxbee code, to a colleague, leaving myself with only the non-working intuos.  I had 2 choices.  Fix it, or live without a tablet.

So, I had this stream of data, and I thought I knew what some of it meant. I got quite excited about that, and contacted Bernard. We ummed and arred over what the 3 byte and 6 byte packets were all about.

What was obvious was this: Per 3 bytes, regardless of whether it was a 3 byte or a 6 byte packet, data looked like this ...
00xx xxxy yyyy pppp hhhh vvvv
... where x, y, p, h and v are related to x delta, y delta, pressure delta, horizontal tilt delta and vertical tilt delta.

But what were the 6 byte packets? And what of the occasional 8-byte packets that weren't triggered by button state changes? Bernard was of the opinion that the 6 byte packets were simply 2x3 byte packets munged together. I thought they were, more likely, delta and error correction, and that the 8-byters were "reset" packets of some sort. Bernard got his o-scope out, and it suddenly became much clearer what was going on.

The intuos marketing material clearly marks the data rate as being /maximum/ 200 samples per second. I'd initially considered this to be rather ambitious - maybe the USB and serial tablets could hit it, but no way the ADB tablets could go that fast. Turns out the Wacom engineers were smart enough, and cared enough, to hit that 200 samples per second, pretty much all the time. Which, in the end, makes sense - the guts of the tablets are identical, it's only the interface and interface firmware that changes between models. By modifying the ADB polling speed in Waxbee, we could get more, or less, 3-byte, 6-byte and 8-byte packets, and, within reason, we were getting absolutely 200 samples per second. If 3 bytes wasn't fast enough, 2 packets were munged together into a 6-byte packet. If that wasn't fast enough, lose the tilt information for a packet, and munge 3 packets together into 8 bytes. Sheer, insane, genius.

The "packaging" of the pen packets sorted out, this only left the content. It was obvious that there was some sort of delta information in there, but testing quickly revealed it wasn't trivial. Neither simple addition nor "static shift and addition" worked. Time to start diving into code.

This is where things got hard, and fast. I didn't have the tools required to even unpack the code resources from the "Classic" mac driver. So I wrote a resource unpacker in C. Which eventually netted me a 300K+ chunk of 68k binary. And, not having the tools to rip that down or even look at the names, I wrote myself a disassembler in scheme. I used scheme because I could run it from the REPL, and change things on the fly. A bit of intelligent function prologue matching got me down to a bunch of disassembled functions, and a big bunch of data. Traps were, with a bit of help from my old MPW discs, decoded, and that enabled me to find the code dealing with talking to ADB tablets, and, from there, the ADB state machine. Which was hideous.

At this point, I was working from the 4.5.5 driver, which deals with *all* mac-connectable Wacom tablets. UD, SD, GD, XD series, ADB, serial and USB connections. And the damn thing is written in C++. It was too complex to even think about breaking down. So I went looking for earlier drivers. Google and wayback failed me, although I now knew what driver I needed. And eventually, I managed to source one from someone selling a boxed adb intuos on the french equivalent of eBay.

Back to work. Rinse and repeat what I'd done with the 4.5.5 driver on 4.1.2, get to the ADB state machine, and, eventually, to the handler for "Talk Register 0" commands (the ADB polling mechanism). And the slow task of picking through the various bits of internal state. Decoding the structure of the class instances, and the class vtables. Tedious, mind-numbing code-sifting.

The structure of the r0 handler is horrible. Cascading branches based on "if this bit set, then this, else this". But I'd managed to get it labelled up based on the bits being tested, so started manually walking through the code with my packet dumps. And here's how it works:

When a tool first comes into proximity, there is a 7 byte packet sent containing tool type and tool serial number. I knew this already; this packet is identified by having it's top nybble set to 100x, where x is the tool index (0 for the first tool, 1 for a second tool in "dual tracking" mode).

When a tool goes out of proximity, there is a 2 byte packet (which may well be part of a "stream" of deltas (i.e. appended after other packets) of the form 0xfe00 | (tool_index << 8). I knew that one as well.

After the "in proximity" packet, we get one or more "tool major" packets. These vary by tool, and contain "raw" sensor data. The high bit is aways set, and the type of the packet is determined by masking various bits in the top byte. Size and content vary by packet type, from 6 to 8 bytes each. At this point, I only had access to standard styli, so I was only seeing "pen major" packets, which encode location, pressure, tilt, orientation and pen button state. agin, I knew this packet, and had it ripped to bits already.

After the major packets, we get delta packets. The encoding is different per tool, but these are either 2 or 3 bytes each. They *all* encode a location delta as described above, the rest varies by tool type. And, for some tools, there's an alternating sequence of delta packet types. The 4d mouse, for example, alternates between "location and rotation" and "location and wheel" deltas.

When some state changes that *isn't* encoded in the delta packets, we get a new "tool major" packet of the relevant type. Button state, for example (and, for obscure reasons, when the 4d mouse wheel goes from +ve to -ve nd vice versa).

The encoded deltas themselves are of the form "sign + magnitude". For each data element (for example, horizontal tilt), the driver keeps a "shift" value, which is initialised to a "magic" number when we hit a tool major packet. The delta is calculated by shifting the magnitude left by "shift" number of bits, and this is then added or subtracted from the accumulated value according to the sign bit. So far, so (relatively) simple. But then comes the clever bit. The shift value is manipulated according to the magnitude field. So, for tilt, if the magnitude is 0b111, we add 2 to the shift value. 0b110, add 1. 0b010 or 0b011, subtract 1. 0b001, subtract 2. 0b000, subtract 3.

It's what I'd describe as "adaptive shift delta encoding". It's not general, as such - the magic numbers need to be both synchronised between hosts, and probably hand-tuned according to scale and expected delta magnitudes per sensor reading, but it's absolutely brilliant.

So, how does this "adaptive shift" stuff work, then?

In a naive tablet protocol, for example that of the ultrapads and earlier, we transmit all the data, all the time. If the stylus has moved from x location 0x1000 to 0x1010, then to 0x1015, we transmit first 0x1000, then 0x1010, then 0x1015. Wacom could have done this for the intuos tablets, were it not for the combination of ADB's low throughput, and the "required" 200 samples/second thing.

Given that losing samples was presumbly not acceptable, Wacom thus needed to trim down the amount of data being sent per sample. The obvious thing to do is to send deltas where possible. A simple delta encoding scheme sends a first packet containing the "full" data, and then a number of smaller difference packets. In the above example, we could imagine 0x1000, 0x10, 0x05, for example. This works extremely well for messages which are coherent, where each message is likely to be very similar to the preceding one.

There is, however, a catch. If the difference is too big to fit in the fixed size delta packet, it becomes necessary to send a full-size "reset" packet, which "fattens up" the stream again. So, in order to be able to deal with a range of differences (for co-ordinates, these can be thought of as velocities), and thus avoid reset packets, the delta packets must be quite large. And Wacom didn't have the luxury of being able to make the deltas large. Timing constraints mean that the total size of the delta packets for *all* the relevant sensor outputs for a tool can be, maximum, 3 bytes.

So, another approach is needed. A commonly used option is to increase or decrease the delta packet size "on the fly" dependent on the data to be encoded, but, again, that pesky fixed packet size kicks in. So that's out.

Absolute accuracy 100% of the time is not needed, as long as the deltas approximate to, and rapidly converge on, the correct value. The most important measure is transducer location, and Wacom are measuring these to 1/2540 inch (or, if you prefer, 1/100 mm). That's pretty damn precise, a little bit of jitter isn't going to hurt that much as long as it converges fast.

So, what wacom have done is change the range of delta measurements on the fly by shifting the delta value to the right, at the cost of reduced accuracy as the range increases (as we lose the lower significance bits). Let's look at that 5-bit location delta. Remember, it's 1 sign bit plus 4 bits of magnitude.

shift vMin vMax error (mm)
----- ---- ---- ----------
0     0    15   0
1     0    31   0.001
2     0    63   0.002
3     0    127  0.004
4     0    255  0.008
5     0    511  0.016
6     0    1023 0.032

In other words, with a shift value of 4 (which, by chance, is wacom's "starting" value for location delta shifts), it's possible to accommodate a pen velocity of ~ 25 cm / sec, whilst staying within a tolerance of 8/100ths of a mm of the "real" value. That's pretty good as it stands, and Wacom probably could have got away with leaving it at that, at least for the smaller tablets.

Where the genius comes in, though, is how they've dealt with convergence / divergence.

Effectively, what happens is this:

If the delta magnitude is at the upper end of the scale (15 in the case of location deltas), the driver assumes it has "undershot" the target, and increments the shift. At the expense of absolute accuracy, this allows the next delta to encompass a larger range, approaching the assumed "far away" target value faster. If the actual value *was* 15, the next delta can still address it directly.

At the lower end of the range, (say, 1 or 2), the driver decrements the shift, reducing range and thus increasing accuracy, and so converges on the "correct" value faster.

At the *absolute* low end of the range, the driver assumes it has "overshot", and "downshifts" faster, reducing the shift value by 2 or more, allowing the "real" value to catch up.

So, for any given delta magnitude, we "upshift" with maximum value, "downshift" for low-end values, and downshift faster for zeros. The only tuning needed is to decide where, and by how much, to downshift, and what the initial shift value should be.

Like I said before - it's genius.

No comments:

Post a Comment