Version 1, Tom Anderson <twic@urchin.earth.li>, 2005-01-23
Tencoding is a simple scheme for encoding structured data in a terse binary format. It is similar to, but much simpler and more practical than, ASN.1 BER. The name is a homage to bencoding.
The data model supported by tencoding is very simple. The universe consists of four kinds of objects:
Tencoding does not directly support booleans (use C-style integers, 0 for false and 1 for true), characters (use a length-1 string) or floating-point numbers (since portable encoding is a huge pain in the arse; use fixed-point or a pair (mantissa, exponent)).
However, there is slightly more to it than this: as well as a fundamental kind (blob, integer, string, composite), every object has a type. A type is simply a name, and indicates to applications what the semantics of an object, beyond those implied by its kind, are. For example, in describing customers, you might have types like Name and Address; you could then have a type Customer which was a list containing Name and Address objects. Note that types are derived from kinds; all objects of a given type are of a specific kind.
So, how is this universe encoded? Every object is encoded as a type - length - value (TLV) triple, consisting of the concatenation of the encodings of the object's type, the length of its encoded value, and its value. Types and lengths are handled as integers, and encoded using 'stretchy ints'; values are encoded according to specific rules for each kind.
Lengths are clearly already integers, but types are not; accordingly, for encoding, types must be represented as integers. To this end, all types are simply assigned a number. The least significant two bits of the number reflect the kind of the type: 00 for blobs, 01 for integers, 10 for strings and 11 for lists. The rest of the type number is up to the application. Type numbers, and indeed types generally, are not globally unique: every application is free to assign its own numbers. However, the type number 0 (which would otherwise be a blob) is always reserved (it is used as a marker for references in lists).
A stretchy int is a simple way of efficiently encoding unsigned integers of arbitary size. The integer is considered as a bitstring (with most significant bit first), and broken into groups of seven bits (if its length is not a multiple of seven, the string is left-padded with zeroes so that it is). Each group of bits is then written, preceded by a continuation bit, set to 0 if this is the last group in the sequence, and 1 if it is not. For example, 0 is encoded as 00000000, 1 as 00000001, 127 as 01111111, 128 as 10000001 00000000 and 316 as 10000010 00111100.
The values of objects are then encoded as follows:
I'm working on a tencoding library in Java; bug me if you want a copy.