Skip navigation

Yet another work-related post. I have been asked to write a better automatic database update system and against my natural tendencies toward Perl and Python I have opted to do it in Java. Now, previous attempts in Java had been abandoned because they were not performing very well, but I wanted to build something with potential for integration with the company’s infrastructure, so I rolled up my sleeves and decided to investigate.

A quick Google search produced some interesting discussions (please see the Interesting Links below). In summary, the official JDBC Postgresql driver does not support COPY operations and people complain that it’s slow for bulk updates, however, our update sql files are not very structured and, in fact, may contain any (as in different each time) valid SQL code. So, COPY is not what I’d use, anyway.

Some hope for reasonable performance appeared in the form of the driver’s batch mode. So, I wrote some Java classes which read multiple lines of sql statements from an sql text file into a String buffer of configurable size. When this size is reached, these sql statements are added to the reused Statement object with addBatch() and are executed in their own transaction (I have set auto-commit to off) through executeBatch().

Now, I have tried inserting one million rows into a table using a different buffer size each time, i.e. grouping sql statements in batches of one, ten, hundred and thousand statements per transaction. The results are quite promising, don’t you think? (low spec machine, btw)

  • batches of 1 –> 49m 55s
  • batches of 10 –> 15m 04s
  • batches of 100 –> 08m 21s
  • batches of 1000 –> 33m 12s

Interesting links (References):

multi-statetement JDBC updates in batch mode: http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php

making batch updates in JDBC applications: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/tjvbtupd.htm

no copy from postgres JDBC: http://archives.postgresql.org/pgsql-jdbc/2004-06/msg00027.php

copy for PostgreSQL 8.x JDBC Driver: http://kato.iki.fi/sw/db/postgresql/jdbc/copy/

Advertisements

One Trackback/Pingback

  1. […] pgBee is a set of Java classes I wrote for automating bulk updates of Postgresql databases on Linux servers. It requires Java (doh!) and Ant (as a build/execute front-end), it is cronnable and performs very well, especially in multi-threaded mode, which takes full advantage of multi-core CPUs in modern servers. The source of inspiration for pgBee has been previously described. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: