The ZIFF File Format
ZIFF is a simple file format architecture, much like RIFF, EA IFF or
the architecture implicit in the PNG format. It differs slightly from
these in that it is explicitly oriented towards building archives of
directory trees (for some sorts of directory trees): a ZIFF file can be
unpacked into a tree of directories and files, and such a tree packed
into a WIFF file, without losing any (or at least much) information.
Thus, ZIFF is a bit like a cross between RIFF and ZIP. Hence the
name.
To avoid reinventing the wheel completely, let's start with one of
the existing file architectures; the one in PNG (which i'll call PIFF)
is the most modern and probably the most well-though-out, so let's make
it that. I do really like the EA IFF architecture, though - the idea of
FORMs is quite cool, and the LIST/PROP thing is really neat.
Briefly, a PNG file looks like:
- A signature; this a very cleverly designed structure, whose
rationale is explained in some depth in the PNG spec, but which looks
like:
- A first magic number (0x89)
- A three-character format name (0x504e47, "PNG", in PNG), of which
the first character should form a unique two-byte combination with the
first magic number (unique in the sense that no other file format starts
with those two bytes)
- A second magic number (0x0d0a1a0a)
- A sequence of chunks, each consisting of:
- A 4-byte signed integer length field (covering the chunk data
only)
- A 4-byte chunk type code, which must a four-letter ASCII string,
with four bits of information encoded in the capitalisation of the
letters (indicating whether the chunk is ancillary, private and safe to
copy; there is also a reserved bit)
- Some number of bytes of data
- A 4-byte CRC value
ZIFF also complies with these rules. The format name embedded in the
signature should be "ZIF" (0x5a4946). ZIFF also inherits PNG's various
other rules, such as that numbers should be encoded in big-endian form.
The one actual difference is in text encoding: PNG specifies that all
text should be in Latin-1 (aka 8859-1) encoding, but ZIFF will use
UTF-8. The other difference is that ZIFF files should use the file
extension zif rather than png.
ZIFF then adds the following:
- The restriction that the file must contain exactly one chunk.
Rather, one top-level chunk; as explained below, chunks can contain
other chunks.
- A nuance to the interpretation of chunk type codes: codes are to be
interpreted in the context of the type of their enclosing chunk. Thus
(to use a hypothetical example), a DATA chunk inside an image chunk
might mean something quite different to one inside a sound chunk.
Top-level chunks do not have an enclosing chunk; their type codes are
thus interpreted in a global namespace.
- The idea that chunks have names. Not chunk types, individual chunks.
Very simply, this means that the first thing in any kind of chunk is a
name: this is encoded as UTF-8, and prefixed by the length of the
encoding (not the length of the string), as a single unsigned byte.
Everything after that is genuine chunk data.
- A definition for the third information bit in chunk names, which PNG
leaves it reserved and meaningless. ZIFF defines this as a 'composite
bit': if set, the content of the chunk (after the name) is a sequence of
other chunks.
- A chunk type, iNDX, which is an index chunk: it contains a list of
the chunks contained in the enclosing chunk. It consists of a sequence
of index entries, one for each chunk, each consisting of an offset
(measured in bytes, starting at the start of the enclosing container's
data area, and expressed as a 4-byte signed integer), the 4-byte type
code of the chunk, and its name, encoded as in the chunk itself (byte
count byte + encoded characters). The point of this chunk is to allow
fast random access to deeply nested chunks.
The mapping between ZIFF files and directory trees is therefore quite
simple. A chunk becomes a file or directory, whose name is the chunk
name, type (Macintosh type code or file extension) is the type, and
content is the content, be it bytes or chunks.
One thing which feels important is that a ZIFF file is just a wrapper
for a ZIFF chunk. I think this means you can have folder trees which are
partially packed - where bits are in ZIFFs and bits are real files.