Transport Layer Protocols

I collect transport-layer protocols. NONE MORE DORK.

See also under Low-Level Protocols.

For general notes on flow control in IP networks (essential in large, heterogenous networks, like the Internet), see RFC 2581, RFC 2309, RFC 3448, and, above all, RFC 2914.

Anyway, in order of increasing obscurity ...

Transmission Control Protocol (TCP)

SpecificationRFC 793 (amended by RFC 1122)

The protocol which transports 99% of the world's network data (not counting phone calls).

There are a bunch of specifications extending RFC 793; the only one which officially updates it is RFC 3168, which adds support for ECN, but there are specs for high-performance options, SACK, DSACK, and a bunch of other stuff.

The single big weakness of TCP, to my mind, is that it's a stream-oriented protocol, when almost all application protocols are message-oriented in some way (the only one i can think of that isn't is telnet). This means that every application-layer protocol has to provide its own messaging sublayer (usually an implicit one), which is a lot of wasted effort. Also, the invisibility of the message boundaries to the TCP layer means it can't use them to organise its transmissions, so you end up with hacks like Nagle's algorithm to make it work smoothly. Yes, being a stream fits naturally with the unix programming model, but then the unix programming model is cracked anyway.

Another weakness of TCP is its setup overhead. TCP carries out an exchange of packets (the 'three-way handshake') before the endpoints get to exchange data. In addition, the flow control algorithm for TCP involves a 'slow start', where transmission starts slowly, and ramps up to the capacity of the route over time. These factors combine to mean that a TCP connection does not become efficient until quite a number of packets in; whilst this is not a problem for long-lived connections (as used by connection-oriented application layer protocols, or those making large transfers), it makes TCP very unwieldy for short-lived connections, as used by many service protocols (like DNS, SNMP, etc).

Transactional TCP (T/TCP)

RFC 1644 specifies a modification of TCP (which never really took off) which allows a TCP connection to start carrying data earlier, partially overcoming the setup overhead.

TCP With Sequenced Packets

The lack of message demarcation is addressed by my modest proposal for sequenced packets over ordinary TCP.

User Datagram Protocol (UDP)

SpecificationRFC 768

The protocol which transports the other 1% of the world's traffic.

UDP's killer problem is its unreliability; messages are guaranteed to be delivered intact if at all, but there's no guarantee that they'll actually be delivered. Other problems are lack of in-order delivery, lack of duplicate prevention, lack of connections, and the limitation of message size to the network layer MTU. If you don't need those, though, UDP is boss.

UDP Lite

RFC 3828 specifies UDP Lite, a minor modification of UDP which allows delivery of damaged messages. This may be useful for error-tolerant application layer protocols, such as streaming audio or video protocols.

Stream Control Transmission Protocol (SCTP)

SpecificationRFC 2960
Ordered?Y (optional)

Big, scary protocol with more options than you can shake a stick at. It was essentially designed as a successor to TCP, although it's not intended to replace it. The major changes, from the application point of view, are that it provides a message-oriented connection, and messages can optionally be delivered out of order ('order of arrival') in a fairly flexible way. Other changes include multiplexing of several streams of messages within a connection, multihoming of connections (so connections can be spread over several networking interfaces at either end), and bundling of multiple messages into a single network-layer packet. Internally, SCTP uses more complex mechanisms for flow control and validation than TCP.

SCTP messages can be larger than the network layer MTU.

See also RFC 3286 for a gentle introduction to SCTP.

Internet Link (IL)

Specification'The IL Protocol' (Plan 9 Manual)

This is the transport-layer protocol used for RPC in the Plan 9 operating system. It's used to transport a reliable, duplicate-free ordered stream smallish (up to MTU sized) messages from one host to another. IL doesn't really have any flow control, although a rudimentary form could probably be added, using the information used for reliable delivery.

IL packets sit inside IP packets (with protocol number 40 = 0x28), and look like this:

    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   |            Checksum           |         Packet Length         |
   |  Packet Type  |    Special    |          Source Port          |
   |        Destination Port       |             ?!?!?!
   |                     Sequence Identifier                       |
   |                   Sequence Acknowledgement                    |

Where the fields are:

Checksum (ilsum)
IP-style checksum (complement of one's-complement sum) over the entire packet, including the IP header, the IL header, and the payload, with the sum and special fields taken as zero. Fucking stupid definition if you ask me - firstly, it's time to lose the stupid IP style checksum, and use a proper CRC, secondly, it should use a TCP-style IP pseudo-header, not the actual header, and thirdly, it shouldn't take the special field as zero (unless there's a reason for that i don't know).
Packet Length (illen)
Total length of the packet, from the start of the IL header.
Packet Type (iltype)
Identifies the type of packet; one of sync (0), data (1), dataquery (2), ack (3), query (4), state (5) or close (6).
Special (ilspec)
Reserved for future use. The spec doesn't say what you should do with it now; internet tradition says set it to zero when sending, and ignore its value when receiving.
Source Port (ilsrc)
Source port number, much as in TCP.
Destination port (ildst)
Destination port number, much as in TCP.
Okay. The spec (ie the Plan 9 manual) defines the header in terms of a C struct. The struct is laid out in such a way that any C compiler known to man will insert two bytes of padding at this point, and if there isn't padding here, the next two four-byte quantities are not aligned on four-byte boundaries, which would be kind of unprecedented in an internet protocol. However, the spec says in a couple of places that the header is 18 bytes long, which implies that there's no padding. I assume, therefore, that there really is no padding (i guess Plan 9 was first implemented in 16-bit machines, where four-byte quantities are doublewords, and don't have to be naturally aligned). However, drawing the next two fields following on directly makes the diagram look gross, so i'm just leaving a hole here. Note that if there is padding here, tradition says you should set it to zero when sending (although if you leave it to the compiler, it could have any value; under the gcc i have, it'll be 0x0000 if it's allocated on the heap, and 0xffff if it's on the stack - unless you use alloca, in which case it's 0x000!), and ignore it when receiving.
Sequence Identifier (ilid)
The sequence number of the message.
Sequence Acknowledgement (ilack)
The sequence number of the last in-sequence message received by the sender.

If i were designing IL2, i'd reform the checksum (CRC16 over the whole IL packet, with only the checksum set to zero, plus a pseudo-header as in TCP), drop the packet length (it's available from the network layer, dammit!), and shuffle the fields to lose the padding. But i'm not.

Anyway, the packet structure and the meanings of the fields are all fairly straightforward (ie fairly similar to TCP!). There are four things to explain: the use of sequence numbers, the different types of packet, the handshake and closing exchanges, and the reliability mechanism.

Sequence numbers are easy: every message (not every byte, as in TCP) in a connection has a unique one (unique within each side of the connection, that is - the 5-tuple (source address, source port, destination address, destination port, sequence number) globally uniquely identifies a message), with the first message having an arbitrary number (not zero, please, to give some protection against packets from dead connections), and each subsequent message having a number one higher than the previous one. A packet carrying a message bears the sequence number of that message as an identifier; packets not bearing messages (for which, see below), use the next number due to be assigned to a message. Every packet (with the exception of an opening sync packet) also carries an acknowledgement, which is the sequence number of the last message successfully received by the sender, where 'successfully' means 'intact, and with all preceding messages also successfully received'. Sequence numbers are the basis of IL's flow control mechanism, for which, see below.

There are seven packet types: sync, data, dataquery, ack, query, state and close.

Opens a connection.
Transmits a message.
Transmits a message and queries the state of the receiver.
Acknowledges receipt of a message.
Queries the state of the receiver.
Indicates the state of the receiver.
Closes a connection.

Only data and dataquery packets carry messages; the other types of packets do not.

The opening handshake for IL is as follows:

  1. Host A picks an initial sequence number (ISN) and sends a sync packet to host B; the sequence identifier is set to the ISN, and the sequence acknowledgement to zero.
  2. Host B receives the sync, picks an ISN of its own and replies with another sync, with the sequence identifier set to its ISN and the sequence acknowledgement set to host A's ISN (ooh - shouldn't it really be one less than A's ISN?).
  3. Both hosts are now clear to send.

The spec is hazy on what to do if packets get lost. I am by no means a transport-layer protocol expert, but my thinking is:

AIUI, sync messages don't carry messages. I don't see why they couldn't, though, and this would allow a fast T/TCP style setup.

Closing a connection is as follows:

  1. Host X sends a close packet to host Y, with the sequence identifier and acknowledgement set as usual.
  2. Host Y receives the close, and sends a close packet back, again with the sequence numbers set as usual.
  3. The connection is closed.

Again, packet loss must be considered:

Finally, reliability. The key thing is that each end of a connection keeps track of which messages the other end has received, by maintaining an awareness of the acknowledged sequence number. During normal, rapid two-way traffic, this occurs simply through the exchange data packets, which carry a sequence acknowledgement. If only one end is actively sending, then the other end should periodically send an ack packet (using an ack timeout, reset on sending of any kind of packet), purely to communicate its sequence acknowledgement. This is highly straightforward.

It's when packets go missing that things get interesting. Two mechanisms come into play:

As an optimisation, a host can send a dataquery packet; this is simply a packet which is both a data and a query - it carries a message, and asks for a state packet to be sent back.

I don't understand why ack and state are separate. Maybe it's so the querier can know that the packet is a response to its query, and not just a delayed acknowledgement.

There is probably a hell of a lot more information about the state of the network and the peer that can be wrung out of these exchanges by a clever implementation. Suggestions on a postcard to Bell Labs, please!

Realtime Transport Protocol (RTP)

SpecificationRFC 1889
Ordered?Y (sort of)

Is realtime. For media stuff.

Runs on top of UDP, or another protocol; sort of a transport decorator. It claims to be "a new style of protocol following the principles of application level framing and integrated layer processing proposed by Clark and Tennenhouse". Make of that what you will.