Skip navigation

If you suddenly needed a cronnable Postgresql database update command for SQL text files, you would probably just type:

cat /path/to/some/dir/*.sql | psql -U postgres someDatabase

So, I am asking myself, have I created something pointless?

As it turns out:

  • pgBee keeps track of the update process. If a pgBee instance is killed, the next invocation will carry on from where the previous one has stopped. And if it finds SQL errors, it will report how far it got in the input files before quitting.
  • pgBee is actually faster than psql when executing SQL statements from a text file. psql took 112m (with one transaction for each statement), psql -1 took 97m (with one transaction for the entire file) but pgBee finished in 21m !!! (with one transaction per batch) That’s a whopping 898 operations per second. All tests were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs).

In a previous post, I promised some examples/tutorials on load-testing Postgresql servers with Tsung. Well, I have tried to develop a database performance testing methodology that may be: a. application-specific, and b. easily applied to different servers and configurations, to assess their relative performance.

Tsung is ideally suited for application-specific Postgresql testing, as it supports a “proxy mode” to record SQL sessions, which are then turned into a scenario file and replayed any number of times. It also supports including alternative sessions in the same scenario file, so that each simulated new user may send a different set of SQL statements, according to the probability assigned to each session.

Different parts of a session may be grouped into transactions (Tsung-speak — nothing to do with your normal database transactions) for statistical monitoring of SQL groups. Transactions are characterised by their name, and names may be shared across sessions. This way, there are tremendous reporting possibilities, as all sessions may have a “connection” transaction offering global connection statistics, while transactions with unique names produce statistics on a specific use-case basis (e.g. complex data search, typical page load etc.).

I’d say there are two main preparation stages for meaningful Postgresql load-testing with Tsung:

These stages will be analyzed each in their respective post. It turns out capturing SQL statements and turning them into a Tsung scenario file was not as easy as I thought.

a Postgresql Bulk Updater in Java

pgBee is a set of Java classes I wrote for automating bulk updates of Postgresql databases on Linux servers. It requires Java (doh!) and Ant (as a build/execute front-end), it is cronnable and performs very well, especially in multi-threaded mode, which takes full advantage of multi-core CPUs in modern servers. The source of inspiration for pgBee has been previously described.

This code is released under a GNU General Public License (GPL).

Ant sometimes refuses to run in the background, so the best way to make pgBee work as a cron job is probably to call a simple shell script from cron, like the one below:

export JAVA_HOME=/usr
export ANT_HOME=/usr/local/ant
/usr/local/bin/ant -f /path/to/build.xml run </dev/null &

All configuration is done in the settings.xml file, but some options may be set through the command line, e.g.

ant -f /path/to/build.xml -Dlock=yes -Dthreads=8 -Dparallel=yes run

pgBee processes all files it finds in a particular (in) directory and moves them to either a done directory or a rejects directory, if there were SQL errors. You’ll need to create the right directory structure and configure pgBee settings before starting. The pgBee process catches SIGTERM, SIGHUP etc. signals and exits gracefully, ready to resume from where it stopped the next time it is run. So, it should be quite reliable, in the absence of hard resets and kill -9. Having said that, I am supplying no guarantees of fitness for any purpose of any kind ūüôā Please use at your own risk.

If you need to make sure a particular set of statements is processed in the same transaction, you only have to include all statements in the same line of an input file, separated by semi-colons. There’s no limit to how many SQL statements you may include in a single line. More information about input file format, usage and configuration may be found in the downloadable tarball

Data models are good and they are clear, if you’re the person writing the application and devising the model. Hell, sometimes, they are not clear even then! So, imagine what happens when you get someone from the street to connect to your database and read your schema in order to understand it. No chance!

Now, this is not about some poor wardriver who doesn’t know how to read the implicit relationships between tables in your model – they had it coming! But what about your legit users, working on a particular aspect of your infrastructure or application, such as developers, DBAs etc. ? How on earth do they make sense of it all when they first start?

Yes, yes, in an ideal world everything’s properly documented, but when was the last time you saw that in a real life situation? Real IT people don’t write helpful comments when they create their tables, views, functions etc. Referential integrity? Don’t make me laugh! Most developers avoid database constraints, to keep the application portable between database systems and database error messages to a bare minimum. Integrity rules are usually enforced at the application level. From a DBA’s perspective, most enterprise-level databases are big collections of seemingly unrelated tables, with no business logic in the DB system itself.

But don’t despair! Help is at hand. Enter Schema Spy:

You dowload the jar file, and then you run something along the lines of

java -jar schemaSpy_3.1.1.jar -t pgsql -cp /path/to/jdbc.jar \
                              -u user -p password -s schema \
                              -db dbname -host localhost:5432 \
                              -o output-dir

After a while, you have a look in output-dir, and the reports are really nice.

Schema Spy even deduces table relationships from field names and types. And it seems to support several different database systems, including Oracle and MySQL. Hurrah!

My new work computer is a Dell Vostro 1310 laptop. I am most chuffed with this new machine, as this is my first modern, up-to-date programming notebook for a long time now (some people think it’s boxy! but all I want is a no-nonsense machine). And it runs Debian Lenny, which marks a change from my old Ubuntu and Slackware days. So, this is me showing off a new laptop and sharing some issues for anyone wanting to install Debian Linux on a Vostro 1310.

Now, Vostro laptops may be customized considerably prior to order, so the hardware specs vary. Mine is a Intel Core2 Duo T9300 @ 2.50GHz (6MB L2 Cache, 800 MHz FSB), 4Gb RAM box, probably near the top of the range. For WiFi, it’s got the Dell 1505 miniPCI card (that’s probably a Broadcom 4328, capable for 802.11n) and for Bluetooth the standard Dell Wireless 360 Bluetooth Module. It’s got a 13.3 inch WXGA screen and a 128Mb NVIDIA GeForce 8400M GS (64 bit) video card.

Here’s the output from lspci:

00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: nVidia Corporation GeForce 8400M GS (rev a1)
06:00.0 Network controller: Broadcom Corporation BCM4328 802.11a/b/g/n (rev 03)
07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
08:05.0 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02)
08:05.2 SD Host controller: O2 Micro, Inc. Integrated MMC/SD Controller (rev 02)
08:05.3 Mass storage controller: O2 Micro, Inc. Integrated MS/xD Controller (rev 01)

Installing Debian Lenny 64-bit (amd64)

For the record, my first attempts at installing Linux on this box were very frustrating, as both Ubuntu 8.04 and 8.10 64-bit versions wouldn’t correctly recognise the Ethernet card (Realtek 8168) – which is the last thing I’d expect not to work. Same thing happened with 64-bit Debian Sarge. I was getting frustrated by the time I tried 64-bit Debian Lenny, but things suddenly worked out of the box and installation was a breeze (using the netinst CD).

I decided to go for the easy option and install Windows drivers for the WiFi card through ndiswrapper. The process is relatively straightforward:

Well, all you need to do (as root) is rmmod ssb ; rmmod ndiswrapper ; modprobe ndiswrapper ; modprobe ssb

You should now have a wlan0 interface to configure for WiFi connections (you might want also want to install wifi-radar). The rmmod ssb etc. stuff needs to happen every time the system boots. I have written a simple initialization script that does this.

Now, I thought I was having a Bluetooth problem, until I noticed I had switched off WiFi and Bluetooth using the little switch at the right side of the laptop, next to the DVD drive slot. As it happens, Bluetooth worked out of the box, but please have a look at this if you have Vista pre-installed:

I had opted for Windows XP pre-installed with Vista installation media, so I didn’t experience any problems. In fact, I routinely use Bluetooth to connect to my 3skypephone mobile and use it as a 3G modem. Please have a look at this, if you are interested.

I have also installed NVIDIA drivers for the video card (here’s one of many tutorials) and Compiz-Fusion, which looks quite nice! Here’s a brief video:

Screen capture (with recordmydesktop) was a bit flickery, sorry, but I was stressing the machine: I was using loads of Compiz-Fusion eye-candy and installing Vista as a virtual machine through VirtualBox at the same time.

Suspend and Hibernate work out-of-the-box. All-in-all, this laptop gives me everything I need for heavy development work – power management, connectivity, performance (and eye candy to impress co-workers). I don’t know if the fingerprint scanner works, I haven’t even thought about using it yet.

My only real complaint up to now is audio ūüė¶ This is an interesting story, actually, because I had sound when I first installed Lenny about a month ago (well, without headphone jack sense) and then I went for a kernel update, which broke sound! The sound device now doesn’t even show up in the operating system, so it’s no use recompiling ALSA (which I have done, just in case). Now, Lenny has not been officially branded as a “stable” release yet, this is supposed to happen in 1-2 months, so here’s hoping one of these days I do a system update and suddenly everything works (again). But, as I said earlier, I am using this laptop as a development box, so lack of sound doesn’t really affect me. It’d be nice, however, to be able to listen to some mp3s while at work, which I do through my n800 (as a quick fix).

Update (2008-11-21): A prerelease version of Adobe Flash player 10 has just been released for Linux 64-bit systems. You may find it here. I installed it by extracting and copying to /usr/local/lib and updating the /etc/alternatives/ symbolic link to point to /usr/local/lib/

Update (2009-02-09): The real problem with sound on this laptop is that the operating system does not even recognise there is a soundcard in the system (there is no audio controller in the lspci output). A few days ago, I decided to update my kernel to 2.6.26-1-amd64 using apt-get, just in case it would make a difference. Well, it doesn’t ūüė¶ I have downloaded my kernel’s headers and recompiled the latest version of ALSA (1.0.19), but the audio controller just doesn’t show up. So, I’ve bought myself a cheap (10 EUR) C-Media based USBsound card, which works fine (mic too).

Linux On Laptops
TuxMobil - Linux on Laptops, Notebooks, PDAs and Mobile Phones

Over the last few years, I have very consciously shifted away from web interfaces into what most people call the back-end: systems, databases etc. I had grown tired of browser incompatibilities and unpredictability!¬†You may find trouble in server-side¬†land, too, but, in most cases, you’ve done something wrong and there is a perfectly reasonable, rational (and sometimes obscure) explanation waiting to be discovered.

So, I am completely out-of-touch with recent UI advances, as simple CSS + Javascript pages and extensive Perl-Tk and pygtk coding probably don’t count as UI advances for most people. And I tend to¬†code things the hard way: ViM

Now, Java is trying to re-enter the RIA arena with JavaFX and reclaim some rich UI tech real-estate from frameworks such as Adobe Flex and Microsoft Silverlight. And thanks to Peter Pilgrim and Skills Matter, I have just had a very interesting introduction to the history, concepts and potential of JavaFX.

Now, Chris Oliver, the guy who started F3, which has now developed into¬†JavaFX, sounds exactly like my kind of guy: Form-Follows-Function, list comprehensions, SQL influences, declarative stuff. Pre-compiled UI definitions running¬†on top of the JVM… Imagine, if you like, writing something like JavaScript, with syntactical goodies derived from Lisp and SQL, including UI element binds and definable triggers, which may execute any arbitrary Java code. Now, for me, this puts fun back into computing (and Java, which often is the C++ of our times). It’s not quite stable yet, but there should be a Preview release coming out end of this week from Sun Microsystems.

Now, I don’t think I am making this appealing to the majority of RIA developers. They usually think: timelines, SceneGraphs, rich media, GUIs, WYSIWYG, ActionScript etc. JavaFX, of course, has been developed to handle such concerns.

But to me, it seems like JavaFX might turn out to be the framework of choice for people who hate coding for UIs (but love coding in general), people who want them to be easy and fun, yet very powerful. I am, probably, one of the very few people who write Java in vi. Projects like Ant have made this possible.

If JavaFX matures in the current direction, it looks like I might be able to get away with designing rich UIs in vi, too. And before any of you start screaming: we don’t care about writing Java UIs using vi, may I simply point out: isn’t this the most convincing argument¬†for the promised power of¬†JavaFX? – they must be doing something right!

Yet another work-related post. I have been asked to write a better automatic database update system and against my natural tendencies toward Perl and Python I have opted to do it in Java. Now, previous attempts in Java had been abandoned because they were not performing very well, but I wanted to build something with potential for integration with the company’s infrastructure, so I rolled up my sleeves and decided to investigate.

A quick Google search produced some interesting discussions (please see the Interesting Links below). In summary, the official JDBC Postgresql driver does not support COPY operations and people complain that it’s slow for bulk updates, however, our update sql files are not very structured and, in fact, may contain any (as in different each time) valid SQL code. So, COPY is not what I’d use, anyway.

Some hope for reasonable performance appeared in the form of the driver’s batch mode. So, I wrote some Java classes which read multiple lines of sql statements from an sql text file into a String buffer of configurable size. When this size is reached, these sql statements are added to the reused Statement object with addBatch() and are executed in their own transaction (I have set auto-commit to off) through executeBatch().

Now, I have tried inserting one million rows into a table using a different buffer size each time, i.e. grouping sql statements in batches of one, ten, hundred and thousand statements per transaction. The results are quite promising, don’t you think? (low spec machine, btw)

  • batches of 1 –> 49m 55s
  • batches of 10 –> 15m 04s
  • batches of 100 –> 08m 21s
  • batches of 1000 –> 33m 12s

Interesting links (References):

multi-statetement JDBC updates in batch mode:

making batch updates in JDBC applications:

no copy from postgres JDBC:

copy for PostgreSQL 8.x JDBC Driver:

One of the nicest things I have done recently was to attend the First International Erlang eXchange in London ( It was jam packed with exciting information on a variety of topics, and I expect a lot of this information will be popping up over the next few weeks one way or the other. Now, one of the things I discovered at the eXchange was Tsung, a distributed performance-load-stress testing tool for http and postgresql servers written in Erlang – loads of scaling-up potential there. Now, this happens to be an important part of my new job, so please expect more on the topic very soon (real examples, tutorials etc.)

Apparently, there are subtle differences between the terms performance, load and stress testing, you may read an opinion here:

UPDATE: If you’ve come this far, you may also have a look at the following posts (tutorial):

pgTsung: app-specific testing methodology

Have you ever wondered how and why things are organised in Linux filesystems? Do /opt, /var, /home, /usr baffle and confuse you? Now, I have been a Linux user for several years and, having used several different Linux distributions I have a pretty good idea where things usually reside. But, somehow, I hadn’t come across this before, and it’s such a useful thing to have read, especially if you are a linux newbie! Thanks, Lance.

The pdf link at the bottom of the page is an interesting read.

So, you want to print from Java applications in Linux (through CUPS, which is the default printing system in Ubuntu and other distributions)…

Turns out it’s easy: Select a particular Page Orientation in your printer’s Job Options panel. Thank you techexplorer…