Aha! Have finally understood this, thanks to Wikipedia. Readers might also be interested in the original paper

A Golomb code is variable-length code, a bit like Huffman; however, rather than being based on the data, like Huffman, it's based on a simple model of the probability of the values (which are explicitly dealt with as natural numbers, rather than being abstract symbols): small values are more likely than big ones. The precise relation between size and probability is captured in a parameter, the divisor.

To Golomb-code a number, find the quotient and remainder of division by the divisor. Write the quotient in unary notation, then the remainder in truncated binary notation. In practice, you need a stop bit after the quotient: if the quotient is written as a sequence of zeroes, the stop bit is a one (or vice versa - and people do seem to prefer to write their unary numbers with ones, which is Wrong). The length of the remainder can be determined from the divisor.

A Golomb-Rice code is a Golomb code where the divisor is a power of two, enabling an efficient implementation using shifts and masks rather than division and modulo.

For example, here's the Golomb-Rice code with divisor 4, for numbers up to 15:

Value | Quotient | Remainder | Code |
---|---|---|---|

0 | 0 | 0 | 1 00 |

1 | 0 | 1 | 1 01 |

2 | 0 | 2 | 1 10 |

3 | 0 | 3 | 1 11 |

4 | 1 | 0 | 0 1 00 |

5 | 1 | 1 | 0 1 01 |

6 | 1 | 2 | 0 1 10 |

7 | 1 | 3 | 0 1 11 |

8 | 2 | 0 | 00 1 00 |

9 | 2 | 1 | 00 1 01 |

10 | 2 | 2 | 00 1 10 |

11 | 2 | 3 | 00 1 11 |

12 | 3 | 0 | 000 1 00 |

13 | 3 | 1 | 000 1 01 |

14 | 3 | 2 | 000 1 10 |

15 | 3 | 3 | 000 1 11 |

If you want to encode signed numbers, i suppose you just add a sign bit somewhere.

There are also exponential Golomb codes; i think these are where the arithmetic operation at the heart of the code is the taking of a logarithm in some specified base, the two parts of the code being the exponent and a remainder, where the value encoded is ((base ** exponent) - 1) + remainder (the -1 is there so you can encode zero, since any base to the power of zero is one). Here, the length of the remainder will vary; it can be determined from the base and exponent, though: it's log2((base ** (exponent + 1)) - (base ** exponent)). I believe that in practice, only the code with base 2 is used.

Here's the exponential Golomb code with base 2, for numbers up to 15:

Value | Value + 1 in Binary | Exponent | Remainder | Code |
---|---|---|---|---|

0 | 1 | 0 | 0 | 1 |

1 | 10 | 1 | 0 | 0 1 0 |

2 | 11 | 1 | 1 | 0 1 1 |

3 | 100 | 2 | 0 | 00 1 00 |

4 | 101 | 2 | 1 | 00 1 01 |

5 | 110 | 2 | 2 | 00 1 10 |

6 | 111 | 2 | 3 | 00 1 11 |

7 | 1000 | 3 | 0 | 000 1 000 |

8 | 1001 | 3 | 1 | 000 1 001 |

9 | 1010 | 3 | 2 | 000 1 010 |

10 | 1011 | 3 | 3 | 000 1 011 |

11 | 1100 | 3 | 4 | 000 1 100 |

12 | 1101 | 3 | 5 | 000 1 101 |

13 | 1110 | 3 | 6 | 000 1 110 |

14 | 1111 | 3 | 7 | 000 1 111 |

15 | 10000 | 4 | 0 | 0000 1 0000 |

The relationship between the code and the value + 1 in binary will not be lost on the reader.

Kiely and Klimesh (in "Generalized Golomb codes and adaptive coding of wavelet-transformed image subbands") generalise the idea of a Golomb code, using the idea of an index function, which maps from integers to an index (which is also an integer). The integers can be partitioned into sets (called index sets) according to the index they map onto; integers can then be encoded as their index, plus their rank in the index set (an ordering for the index set is needed; K+K just use the natural ordering). The index is then encoded in unary, and the rank in truncated binary; the length of the rank can be determined from the size of the index set for the specified index. K+K specifically look at index functions which generate contiguous sets, and where the positions of the sets on the number line increase monotonically with index: in this case, rather than sets, we can think of index threshold, where threshold(i) is the smallest z such that index(z) == i; we can then compute rank by z - threshold(index(z)). Golomb-Rice coding is generalised Golomb coding with index(z) = z / d, where d is the divisor; exponential Golomb coding has index(z) = log((z + 1), d). K+K mention that some guy called Howard generalised Golomb codes further, into 'unary prefix/suffix' codes, but they think this is a bit silly.