Experience and Good Taste in Software/Systems Design

May 10th, 2012 | Categories: Condor, Guest posts, programming | Tags:

A guest post by Ian Cottam (@iandavidcottam).

I have been a programmer for 40 years this month. I thought I would write a short essay on things I experienced over that time that went into the design of a relatively recent, small program: DropAndCompute. (The purpose and general evolution of that program were described in a blog entry here. Please just watch the video there if you are new to DropAndCompute.)

Once I had the idea for DropAndCompute –inspired equally by the complexity of Condor and the simplicity of Dropbox — I coded and tested it in about two days. (Typically, one bug lasted until it had a real user, but it was no big deal.) My colleague, Mark Whidby, later re-factored/re-coded it to better scale as it grew in popularity here at The University of Manchester. I expect Mark spent about two days on it too. The user interface and basic design did not change. (As the evolution blog discusses, we eventually made the use of Dropbox optional, but that does not detract from this tale.)

Physically dropping programs and their data:
In the early to mid 1970s as well as doing my own programming work I helped scientists in Liverpool to run their code. One approach we used to make them faster was to drop the deck of cards into the input well of a card reader which was remotely connected to the regional supercomputer centre at Manchester. (I never knew what the communication mechanism was – probably a wet string given the technology of the time.) A nearby line printer was similarly connected and our results could be picked up, usually the next day. DropAndCompute is a 21st century version of this activity, without the leg work and humping of large boxes of cards about.

That this approach was worth the effort was made obvious with one of the first card decks I ever submitted. We had been running the code on an ICL 1903A computer in Liverpool; Manchester had a CDC 6600 (hopefully my memory has not let me down – it did become a CDC 7600 at some stage). Running the code locally in Liverpool, with typical data, took around 55 CPU minutes. Dropping it into that card reader so that it automatically ran in Manchester resulted in the jaw dropping time of 4 CPU seconds. (I still had to wait overnight to pick up the results, something that resonates with today’s DropAndCompute users and Manchester’s Condor Pool, which is only large and powerful overnight.)

Capabilities:
Later, but still in the mid 1970s, I worked for Plessey on their System 250 high-reliability, multiprocessor system. It was the first commercial example of a capability architecture. With such there is no supervisor state or privileged code rings or similar. If you held the capability to do something (e.g. read, write or enter another code context) you could do it. If you didn’t hold the appropriate capability, you could not. The only tiny section of code in the System 250 that was special was where capabilities were generated. No one else could make them.

The server side of DropAndCompute generates capabilities to the user client side. They are implemented as zero length files whose information content is just in their names. For job 3159, you get 3159.kill, 3159.vacate and 3159.debug generated*. By dragging and dropping one or more of these zero length files (capabilities) onto the dropbox the remote lower level Condor command code is automatically executed. [* You could try to make your own capability, such as 9513.kill, but it won’t work.]

UNIX and Shell Glue Code:
My initial exposure to the UNIX tools philosophy in the late 1970s profoundly influenced me (and still does). In essence, it says that one should build new software by inventing ‘glue’ to stick existing components together. The UNIX Shell is often the language of choice for this, and was for me. DropAndCompute is a good example of where a little bit of glue produced a hopefully impressive synergy.

The Internet not The Web:
DropAndCompute uses the Internet (clearly). It is not a Web application. I only mention this as some younger programmers, who have grown up with the Web always being there, seem to think the best/only architecture to use for a software system design is one implemented through a web browser using web services. I am grateful to be able to remember pre-Web days, as much as I love what Tim Berners-Lee created for us.

Client-Server:
I’m not sure when I first became aware of client-server architecture. I know a famous computer scientist (the late David Wheeler*) once described it as simply the obvious way to implement software systems. For my part, I’m a believer in the less code the client side (user) needs to install the better (less to go wrong on strange environments one has no control over). In the case of DropAndCompute if the user had Dropbox, it was nothing to install, and just downloading Dropbox if they didn’t.
[* As well as being a co-inventer of the subroutine, David Wheeler led the team that designed  the first operational capability-based computer: the Cambridge University CAP.]

Rosetta – Software as Magic:
Around a decade ago I worked for Transitive, a University of Manchester spin-out, and the company that produced Rosetta for Apple. With apologies to Arthur C Clarke: all great software appears to be magic to the user. The simpler the user interface, often the more complex the underlying system is to implement the magic. This is true, for example, for Apple iOS and OS X and for Dropbox (simpler, and yet I would bet that it is internally more complex, than its many competitors). One small part of OS X I helped with is Rosetta (or was, as Apple dropped it from the Lion 10.7 release of OS X). Rosetta dynamically (i.e. on-the-fly) translates PowerPC applications into Intel x86 code. There is no noticeable user interface: you double click your desired application, like any other, and, if needed, Rosetta is invoked to do its job in the background.

I have read many interesting web based discussions about Rosetta, several say, or imply, that it is a relatively simple piece of software: nothing could be further from the truth. It’s likely still a commercial secret how it works, but if it were simple, Apple’s transition to Intel would likely have been a disaster. It took a lot of smart people a long time to make the magic appear that simple.

I tried to keep DropAndCompute’s interface to the user as simple as possible, even where it added some complexity to its implementation. The National Grid Service in the UK did their own version of DropAndCompute, but, for my taste, added too many bells and whistles.

In Conclusion:
I hope this brief essay has been of interest and given some insight into how many years of software/system design experience come to be applied, even to small software systems, both consciously and subconsciously, and always, hopefully, with good taste. Hide complexity, keep it simple for the user, make them think it is simply magic!