Wednesday, 15 September 2010

On Babies, Bathwater, and Databases

My friend Ed has written a good, provocatively-titled piece in which he explains the relational-vs-noSQL debate in terms of the vested interests of the respective parties. I think that's a valid line of thought - fanatical views are almost always manifestations of strong incentives. But, like the flame wars that inspired it, Ed's article perpetuates the premise that one has to choose either old-fashioned, slow, 20th Century SQL databases or newfangled, 21st Century, fast-and-loose 'Web-scale' databases. I don't think that's true.

It's worth thinking at a slightly more fine-grained level about the properties of older technologies, so as to be able to understand whether they should be abandoned completely. For the sake of argument (and because I'm marginally familiar with it) take PostgreSQL as an example of an old-school relational database. Use it, and out of the box you get (among other things):

  • Familiar SQL syntax, understood by generations of programmers and software tools;
  • Schema management commands;
  • Data management commands;
  • Hard guarantees about data integrity, through transactions;
  • A sophisticated query planner with a choice of algorithms.
Now, the argument from the "Web programmer" camp, that these days it's expedient to forsake integrity in the name of scalability, is a valid one. But that only concerns one of the items listed above. Is it really worth discarding over thirty years of accumulated expertise in the light of that one observation? I'd say not - and good relational database systems can be configured to be more relaxed about traditional ACID properties, precisely so as to be able to gain performance.

Similarly, it's true that an out-of-the-box relational database doesn't suit vast quantities of data distributed over large numbers of machines. Its query planner doesn't understand that situation. But a query planner is a mechanical embodiment of a task that used to be done by humans. To discard relational databases entirely is to shift that burden back onto those humans. I've seen this first-hand again and again - you wouldn't believe how much time Google engineers spend thinking about the most efficient way to query the data, and then writing fiddly bits of code to represent their conclusions. Far better is to upgrade the query planner to embody our understanding of the layout of data in the 21st Century.

The benefits of adapting and tuning existing systems are things like continuity and interoperability. Google's staggeringly huge repositories of data are carefully designed from the ground up to suit the queries they need to support. That's an affordable and necessary activity when you're shunting petabytes and more around. But it means that each repository comes with its own equally peculiar tools for querying, modifying and managing the data and the schemata. And there is certainly a high cost to that, as soon as any activity needs to span projects.

In short, I want a pony. I want the same pony everyone else wants (although some people haven't realised it yet). I want a pony that I can throw concise, high-level queries at, but that will scale. It doesn't have to be always consistent; it's okay for it to be eventually consistent. But I want it to take expressive queries in a well-known syntax - I shouldn't have to learn a new UI or API just to be able to investigate that one record which my current bug refers to. I want it to make plans for executing those queries, explain to me how much time and energy they're going to cost, and run them. All of which, I suspect, is best accomplished by surgery to tools that already exist.

Thursday, 3 June 2010

How to Stop SSH from Hanging on OS X

Run ssh with -vvv to get plenty of debug output. If it hangs immediately after emitting the line:
debug3: Trying to reverse map address ...
then probably you have Kerberos credentials on the client machine, but the host is not part of the Kerberos setup. The problem can be solved by destroying the Kerberos credentials with:
kdestroy

Thursday, 28 May 2009

Installing GHC binaries on Linux (or: umask is evil)

  1. Go to the dist list.
  2. Choose the directory for the version you want.
  3. Make sure to download the file for the right architecture.
  4. Very carefully untar the downloaded file, using tar xjfp. The p option to tar is particularly important. The options are:
    • x extract
    • j through bzip
    • f a file
    • p and do not filter its permissions through umask.
    Without that last option, your umask might remove world-readable permissions from all the installed files, in which case they will be inaccessible to non-root users after they're installed.
  5. cd extracted-dir
  6. ./configure
  7. sudo make install

Wednesday, 20 May 2009

How to set the Window Manager under Gnome 2

Gnome 2 is strongly inclined to use Metacity as its window manager. In theory one is able to set the WINDOW_MANAGER environment variable in order to specify a different window manager. But I had no luck doing that in way that would work with NX, since NX runs gnome-session directly (rather than through GDM) and so there's no way to get the environment set up first. For example, setting this in ~/.gnomerc had no effect. It turns out that a nice non-intrusive way to set the window manager is via a largely undocumented Gnome setting:
gconftool-2 -s /desktop/gnome/applications/window_manager/default -t string ${MY_WM}
gconftool-2 -s /desktop/gnome/applications/window_manager/current -t string ${MY_WM}

Wednesday, 13 May 2009

GWT Hosted Mode Classpath Woe (and Solution)

Working in Eclipse, I added a new dependency to my GWT project, and included the corresponding jar in the project setup (as a user library). After clicking "Restart Server" in the hosted mode control window, I was nonetheless confronted with:
Unable to find <new dependency.xml> - could be a typo, or maybe you forgot to include a classpath entry for source
I checked everything, and the classpath was certainly set up correctly. It turned out that the "Restart Server" button is not enough if your classpath has changed - you need to stop the server altogether, then restart it from within Eclipse.

Wednesday, 22 April 2009

GWT/Eclipse UnsatisfiedLinkError (and solution)

My first steps using the GWT plugin for Eclipse weren't as smooth as I'd have liked. Everything installed fine and I was working through the Quick Start guide happily until I tried to launch my virgin autogenerated GWT project and was confronted with:

.../eclipse34/stable/plugins/com.google.gwt.eclipse.sdkbundle.linux_1.6.4.v200904062334/gwt-linux-1.6.4/libswt-pi-gtk-3235.so: /usr/grte/v1/lib/libc.so.6: version `GLIBC_2.4' not found (required by /usr/lib32/libgtk-x11-2.0.so.0)

A little rooting around on Google internal mailing lists turned up an incompatibility with OpenJDK. Switching the project JRE from OpenJDK 1.6 to Sun JRE 1.6 solved the problem. So, if you ever see the above error message, maybe you'll find the same action helps.

Sunday, 29 March 2009

Installing PostgreSQL on OS X: How to Make it Difficult for Yourself

I installed PostgreSQL the easy way using the excellent installer from enterprisedb.com.

Unfortunately the installer was (quite reasonably) not prepared to cope with the fact that there was already a user called postgres and that user's shell was set to /usr/bin/false. That was because I'd long ago begun to install PostgreSQL in a more manual way, but given up.

The installer puts a script in /Library/StartupItems that is supposed to start the database server when the machine bootstraps. The script didn't work because it uses su to run the server as postgres, but su starts the impersonated user's shell. I tweaked some lines in the script to use sudo instead, and now it does work. The specific changes were in StartService():

#       su - postgres -c "/Library/PostgreSQL/8.3/bin/pg_ctl -w start -D \"/var/PostgreSQL/8.3/data\" -l \"/var/PostgreSQL/8.3/data/pg_log/startup.log\""
       sudo -u postgres /Library/PostgreSQL/8.3/bin/pg_ctl -w start -D "/var/PostgreSQL/8.3/data" -l "/var/PostgreSQL/8.3/data/pg_log/startup.log"

and in StopService():

#       su - postgres -c "/Library/PostgreSQL/8.3/bin/pg_ctl stop -m fast -w -D \"/var/PostgreSQL/8.3/data\""
       sudo -u postgres /Library/PostgreSQL/8.3/bin/pg_ctl stop -m fast -w -D "/var/PostgreSQL/8.3/data"

Additionally, the psql command doesn't pick up the effective user when executed under sudo (even under sudo -s). Consequently, to get a psql command prompt as user postgres, the following is needed:

        sudo -u postgres psql -U postgres