Distributed Objects in Java

This is a tutorial on how to do distributed objects in Java (hence the name, izerntit). There are probably many such things on the web, but this is the best, as (a) my kung fu is the strongest (b) it treats no less than three different ways of doing it and (c) it has the raw socket style, which is unstoppable.

I'm going to start with the analysis of the application, then show how to implement it with sockets, then with RMI, then with CORBA. Yes, i am using a waterfall process - don't try this at home, kids!

Analysis

First up, you have to decide what objects your system is made of. I'm going to assume you can already do this for normal applications, so i'll mostly talk about how it's different for distributed applications.

The key thing is that in a distributed system, there are two different kinds of objects: reference objects and value objects. Reference objects are ones where identity matters, like Customer, Account, Employee, etc; such objects typically have mutable state and non-trivial methods. Value objects are those where only the value matters, like Address, Sort Code, etc; being immutable and having no real methods is a dead giveaway for these objects. If you come from a C# background, you'll recognise this as an object/struct distinction, or if you know C++, as a pass-by-reference/pass-by-value distinction.

So, an example. Let's build a little social networking application, like a baby Orkut or lobotomised LiveJournal or something. The primary class is clearly going to be Person; each Person will have a id (login name, screen name, whatever), a real name, a set of interests and a location. The id and name can just be strings, the interests can be a set of strings, and let's make location a Location. We can use Location to represent locations hierarchically (eg Europe > UK > London > North-East London > Hackney), so it'll just contain a sequence of strings. Each person will also have a set of friends, who are also Persons.

Operations on a Person will be the usual getters and setters (with appropriate set-like methods for the interests and friends - list them, test whether someone is one, add one, remove one). Let's also have a method to find the shortest route to another Person - that does some heavy lifting. There aren't really any operations on a Location, apart from getters and setters.

So, our types look like this:

So, what kinds of objects are these? Well, a Person has mutable state, heavyweight methods (well, one heavyweight method) and identity matters - two Persons with the same name are not the same Person! Therefore, it's a reference object. Location, however, has no real methods, is immutable, and no identity (two distinct objects with the same value represent the same location); thus, it's a value object. Hey, one of each type - how convenient!

Now, we actually need one more thing - a way of finding people in the first place. Resist the temptation to put a static getPerson method on Person; that's bad practice, and, moreover, won't work with distributed objects. Instead, let's have a Society object, which looks like this (and this is very bare-bones):

  • Society
  • This is a reference object. Note that an instance of Society is going to be the way in to the application; every implementation of this model must provide some way for a naive client to get hold of one, so that it can get hold of Persons.

    Okay, let's do some implementation.

    Local Implementation

    [Source code]

    As a warmup, here's the implementation done with no distribution - it's purely local. The code is very, very simple (with the possible exception of the implementation of findRouteTo, but you can ignore that); it should be pretty much self-explanatory, so ch-check it out.

    Socket Implementation

    [Source code]

    So now we make a version of the local implementation which uses sockets to do remote doings.

    The first step is to define a protocol.

    We'll follow the internet tradition, and define a simple, text-based request-response protocol. We'll avoid the complex metastructure of HTTP and stick to a more SMTP-like system; the client sends a one-line request, and the server replies with a one-line response. Requests start with a command and are followed by a tab-separated sequence of parameters (tab rather than space to simplify handling of parameters with spaces in). Responses start with a status code, which (and i'm aping POP here) is either "ok", if the command could be executed, or "no", if there was a problem. If it's an ok, then a tab-separated series of return values follows. If it's a no, then there's an error string instead.

    So, what are the commands? Well, here we can be quite boring, and just map each method call on the reference objects to a command. With one quirk - we're going to pretend Society doesn't exist; i'll explain in a minute. The Person calls are easy to map - the first parameter is the id of the receiver, and the other parameters are the parameters of the call; if there's a list of things (or a Location, which we'll flatten to a list of strings), we'll write each one as a parameter (note that this means we can only have one list per call, and it has to come last in the parameter list). Where a person is a parameter, we can just send the id. Responses work in much the same way. So, why no Society? Well, think about what it does; it's a way of mapping an id to a Person object, but on the wire, there are no Persons - we just use the ids to refer to them - so it's redundant (note that this also means there's no command for getId(), although we will have one called 'exists', so a client can see if there's a person with a given id). It's also used to register new Persons, but since there's not really such a thing as a Person, we can just fold that into the new Person operation (so when you request that a person be created, it's automatically registered). The Society object is thus sort of delocalised; it doesn't exist, but its spirit permeates the whole protocol session. Spooky.

    That gives us the following list of commands (written command PARAMETER PARAMETER LIST-OF-PARAMETERS* -> RESULT RESULT LIST-OF-RESULTS*; booleans have names ending in P, like FOOP, and are encoded on the wire as the strings "true" or "false"):

    We'll implement this protocol with a pair of classes (well, a trio, but one is puny and doesn't count). First, though, we'll capture it in an interface; this interface will define one method for each of the commands, with the appropriate parameters and return value. Now, you may be thinking "this is madness - the protocol is already a mapping of an interface!", but there is method here - we need an interface for a single object which can handle operations on many Persons. Trust me. Oh, and all the methods must be declared to throw IOException.

    The first implementing class is a client; this implements the interface and sits on a socket, translating calls to the methods to requests sent down the socket. It has a constructor taking an address and a port, so you can attatch it to whatever server you like.

    The second implementing class is a server-side handler; this doesn't implement the interface, but knows an object which does: it sits on a socket, reading requests and translating them into calls to the implementing object, then translating the returned objects into responses on the wire. This class has a helper (the actual server class) which sits on a server socket and matches sockets up with handlers.

    Then there are the interfaces for the application classes. Then there are implementation classes which sit on the server side - basically, they're the same as the purely local classes. Then there are the stub classes on the client side and turn method calls into calls on a protocol object. Then there's the tie classes which goes between the protocol interface and the implementation classes. Could do with a diagram here. Not really complicated - D, send me an email.


    Under construction!

    Remember to cover Info objects (aggregate state holders which minimise roundtrips) in a supplementary section.