Skip navigation

I have stopped posting stuff to this blog for a long time now. Please update your links and follow http://pgdba.net/blog instead. Thanks, Michael

Two books I am currently reading are the following:

RESTful Web Services bookThe IOD book is a bit more abstract, but the information within may be applied in a variety of contexts. The RESTful book, naturally, is about web services. Both are great, I highly recommend them. Please click on the covers to find out more or buy.

I was reading an essay the other day about the conceptual differences between “open source” and “free” software, in the FSF meaning of the word (you may read this essay here).

I have been a GNU/Linux user for many many years now, and I owe a great deal to all these people who have contributed through the years. Because of “free” and “open source” software, I have a learned so much I now have a career and job satisfaction. For a lot of software packages out there, you could probably freely exchange the terms, but, to me, it was “free” software that taught me and “open source” software that gave me some good jobs.

Reading about the subtle but important differences in these approaches, I have suddenly remembered how my early GNU years felt like (I like to think I’m in my GNU teens, now — there are GNU grand-daddies out there🙂 It was “open source” that penetrated the market, but it was “free” software that gave us the tools and got me to start reading. Motivation sometimes is paramount.

Nowadays, everyone’s at least heard of “open source” and a lot of companies are actually using it. So, the outlook’s peachy. What’s all this fuss about “freedom”? Are these FSF people irrelevant, at last? Could we simply dismiss them as ramblers from a bygone era and focus on practicalities instead?

I am getting all philosophical here, but it’s a holiday and people have more time to stop and think and get abstract on holidays, after a nice, relaxed meal — so please forgive these musings.

“Open source” is cultural product, be it a methodology or sets of computer instructions. “Open source” has been written by people from many different backgrounds, sharing some ideas — it’s not just technical, it’s also a cultural thing made possible by technology, globalization, economy, our times in general.

In that sense, all this libre software out there is the result of a culture, in the same way that the Parthenon is the result of the dominant technological and philosophical ideas in ancient Athens, the Pyramids the result of such ideas in ancient Egypt, and so on… You cannot understand the monuments if you don’t study the culture. And you cannot build this stuff unless you live the culture.

So, does the “freedom” approach belong to history books? I think this will be decided by the number of people writing and using “free”, as opposed to “open souce” software. Let’s keep the culture alive.

Happy New Year, everyone!

What a great idea/implementation! A single html file with some Javascript code and you’ve got a self-contained wiki system (data + code) you may carry with you on a USB stick, post online or send via email — your choice! No server or software requirements, except maybe for a modern Javascript-enabled browser (preferrably Firefox). Thank you Jeremy!

TiddlyWikihttp://www.tiddlywiki.com/

TiddlyWiki Markup – http://tiddlywiki.org/wiki/TiddlyWiki_Markup

TiddlyWiki (wikipedia) – http://en.wikipedia.org/wiki/TiddlyWiki

You may also be interested in Wiki on a Stickhttp://stickwiki.sourceforge.net/

By the way, TiddlyWiki almost works on my Nokia n800. I may read and edit files properly through the built-in Opera browser, but I cannot save them. There is an Opera work-around, TiddlySaver, but it relies on a .jar file and the n800 has no Java support. Pity!

tiddlywiki-n800

UPDATE: The http://www.checkettsweb.com/tw/gtd_tiddlywiki.htm + n800 temptation made me fiddle a bit more, and saving now works! (after confirming the unsafe operation 4 times🙂 All it takes is commenting out or deleting line 549 of the gtd_tiddlywiki.htm file — the one with saveTest() — before your first use. And if you have basic Unix command line skills, here is a quick and dirty script for launching your little agenda from the command line, showing the day’s entries (if you copy-paste you shall see the entire lines):

#!/bin/sh
sed -i '/<div tiddler\="DefaultTiddlers"/s/\[\[.*\]\]/\[\['$(date +%d\\\/%m\\\/%Y)'\]\]/' /media/mmc1/gtd/gtd_tiddlywiki.htm
dbus-send --session --print-reply /com/nokia/osso_browser \
--dest=com.nokia.osso_browser com.nokia.osso_browser.open_new_window \
string:file:///media/mmc1/gtd/gtd_tiddlywiki.htm

… assuming the file resides in /media/mmc1/gtd. Backups will be placed in /media/mmc1/gtd/twBackups.

Ok, here’s a quick game! Let’s spot software releases which happened in the first days of December, 2008.

1. WordPress 2.7 –http://en.blog.wordpress.com/2008/12/05/new-dashboard-design/

2. Python 3.0 – http://www.python.org/download/releases/3.0/

3. JavaFX 1.0 – http://www.sun.com/aboutsun/pr/2008-12/sunflash.20081204.1.xml

The biggest hurdle I have had to overcome in order to use Tsung for load-testing Postgresql servers has been a conceptual mismatch between Tsung and what I wanted to do. Tsung’s model probably originates in the load-testing of web servers: everything is described in terms of user arrival rate, hits, pages, transactions, thinktimes. Database usage may not be readily described in these terms.

Before going any further, I should probably make clear I didn’t need Tsung to do performance testing. Performance testing may be easily done by throwing a specific set of SQL queries to the database server (in controlled conditions) and checking/timing the results (this could be a separate tutorial🙂. Tsung gives you the tools to model proper user interaction and real-life usage and I have been trying to determine a server’s load capacity.

In other words, how many times our typical or target load could a particular server/set-up handle?

And this load had to be expressed in a Tsung-compatible xml file describing mainly:

  • alternative user sessions (with associated probabilities)
  • user arrival rate

Here’s a quick reminder of what Tsung transactions mean:

Different parts of a session may be grouped into transactions (Tsung-speak — nothing to do with your normal database transactions) for statistical monitoring of SQL groups. Transactions are characterised by their name, and names may be shared across sessions. This way, there are tremendous reporting possibilities, as all sessions may have a “connection” transaction offering global connection statistics, while transactions with unique names produce statistics on a specific use-case basis (e.g. complex data search, typical page load etc.).

For simplicity, I have opted to include only two “transactions” in each alternative user “session” (use-case):

  • a connection transaction (identified as “connection” in all “sessions”)
  • a SQL block transaction (with a unique, “session”-specific name)

Know your (target) usage

Here comes the obvious but imporant bit: you need to know your real-life or your target usage to proceed! Expressing your (target) usage into Tsung values is the only thing that binds your experiment to real-life and allows some conclusions to be drawn from the tests.

The defined “sessions”, should, of course, reflect your usage profile. This boils down to including a representative variety of use-cases, with the right probability factor assigned to each case.

But you also need to express the number of new “sessions” per second Tsung initiates against your system, i.e. the Tsung user arrival rate.

Adapting the scenario file

This is a quick summary of what you should edit in your Tsung scenario file to specify the desired load:

  • allocate different probabilities to your alternative “sessions” (do they add up to 100?)
  • make sure you wrap the important bits of each session into unique “transactions”
  • define appropriate user arrival rates in your “load phases”

Load phases are defined in this section of the Tsung scenario file:

   <load>
      <arrivalphase phase="1" duration="1800" unit="second">
         <users interarrival="10" unit="second"></users>
      </arrivalphase>
      <arrivalphase phase="2" duration="1800" unit="second">
         <users interarrival="6" unit="second"></users>
      </arrivalphase>
      ... and so on...
    </load>

Analyzing the results

Assuming have managed to run your tests, now comes the tricky part of interpreting your results. The Tsung helper perl script generates a multitude of graphs, but here’s a quick shortcut. The files which have been most useful to me are the following:

  • report.html
  • images/graphes-Transactions-max_sample.png
  • images/graphes-Transactions-mean.png
  • images/graphes-Users-simultaneous.png

When looking at these graphs, the two most important things to remember are the length (in seconds) of each load phase and what each phase represents. For example, the following graph (manually colored for convenience), may be divided into four sections, each representing a particular load phase (each phase lasted 1800 seconds, i.e. half an hour). This graph basically tells us things start to fall apart at 8x our target load.

simultaneous DB users

The reason the interpretation of this graph is easy is that we are not using any loops in each user “session”. Each Tsung “user” simply connects, sends a particular SQL block to the server, receives some results and exits. The user arrival rate stays constant throughout a particular load phase. Statistically speaking, if the server is responding properly, the number of new users in the system is always matched by the number of users exiting. Therefore, you only get simultaneous Tsung users if things start going wrong, when the server’s response times are increasing. And when you see the green and red lines splitting, things have gotten out of hand: Tsung is introducing new users which are not even able to connect!

We should always, of course, check, if the server’s performance was acceptable while it was “coping” with our load. In addition to the numbers in report.html, you could get the big picture by simply looking at images/graphes-Transactions-max_sample.png. The horizontal line for each “session” corresponds to the longest response time ever recorded for a particular use-case.

max-elapsed-time-per-transaction

Armed with this knowledge, you may start experimenting further. Does your server recover from brief spikes of activity (e.g. long 4x phase, brief 16x phase, 4x phase etc.)? What effect do particular server configuration changes have on load capacity? And so on… This could easily turn into a full-time job🙂

…what Lisp is to Emacs. Or so it seems!

http://www.tummy.com/Community/Presentations/vimpython-20070225/vim.html

This is SO not documented outside vim. General purpose programming language instead of vim script!

Your best source of documentation may be running the command

:help python

in Vim

Update:

OMG, quick vim research revealed it also supports macros in tcl, perl, ruby, mzscheme! Now, I’m spoiled for choice🙂

With modern servers, a lot of people are migrating to 64-bit architectures. Apparently, there are performance considerations in regards to using a 64-bit JVM. If you are using/considering installing a 64-bit JVM, you might want to read up on compressed oops (ordinary object pointers) — don’t worry, we are only talking about JVM command line options which affect performance. Please visit the links below:

http://www.lowtek.ca/roo/2008/java-performance-in-64bit-land/
http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/
http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic=/com.ibm.java.doc.user.lnx.60/user/garbage_compressed_refs.html

Apparently, there has been a severe security breach at Fedora. They had to rebuild their repositories and change their signing keys, and it might just be they have only rebuilt repositories for Fedora 8 and 9. Which might just explain why I have been unable to use yum to install software on a Fedora Core 5 box for several weeks now! And, yes, people, I know FC5 is no longer officially supported, but the mirrors were there and I was still using them not long ago. So, attention Fedora users! If you are using a Fedora release below 8, you should probably consider re-installing a recent release or risk staying stuck with a system with no software updates and no packages.

Please have a look at this: http://www.redhat.com/archives/fedora-announce-list/2008-September/msg00007.html

Tsung has a “proxy mode” which records SQL statements and produces an appropriate Tsung scenario file. What could be simpler? I shall just point my web application to speak to the Tsung proxy instead of the database and I will use it to generate “typical usage” cases.

Unfortunately, this is not an option if, say, your application uses a web framework which maintains several open connections to the database server. The Tsung proxy can only handle one connection at a time. So your application does not function properly and you are not able to use it to generate the “typical usage” scenaria.

Then there is pgFouine, a PostgreSQL log analyzer, which shows some promise, which produces Tsung compatible output on demand. But pgFouine principally analyzes log files to group and rank statements according to how well they perform in the database, and this approach has spilled over to Tsung scenario file generation: the order of the SQL statements is not preserved! This, by itself, perhaps would not be a problem, but I often record multiple use-cases in one go and pgFouine mixes them up.

The best way to create our test cases, therefore, is to use the log files from an idle Postgresql server, after enabling the logging of all SQL statements in the server. I have written a few scripts which help with the process, but this was after already changing the logging format of our Postgresql server to pgFouine’s requirements (syslog). Thus, the Postgresql server needs to log in this particular style:

Sep  1 16:21:19 pgtest postgres[4359]: [136-1] LOG:  statement: SELECT rolname FROM pg_roles ORDER BY 1
Sep  1 16:21:19 pgtest postgres[4359]: [137-1] LOG:  duration: 0.178 ms

To make sure this is the case, you probably need to edit your postgresql.conf file and set the following values:

log_destination = 'syslog'
redirect_stderr = off
silent_mode = on
log_min_duration_statement = 0
log_duration = off
log_statement = 'none'
log_line_prefix = 'user=%u,db=%d,host=%h '
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'

Then, you need to edit /etc/syslog.conf to set up a PostgreSQL facility and exclude it from the default log file:

local0.*   -/home/postgres/logs/postgresql.log
*.info;mail.none;authpriv.none;cron.none;local0.none

For the changes to have effect, you need to restart the syslog service (/etd/init.d/syslog restart) and Postgresql.

You are now ready to start capturing SQL statements in the Postgresql log file. To make sure you shall be able to filter the log file into separate use-cases, you should choose a unique string identifier (e.g. ‘complex search 001’) to throw at the database server at the beginning and end of a particular use-case. You may do this by connecting to the server via ssh and typing:

echo "SELECT 'complex search 001';" | psql -U postgres

… before using your web application (which must be configured to talk to this particular Postgresql server). At the end of this use-case (‘complex search 001’) all you need to do is repeat the line above.

When you have finished recording all batches (use-cases) of SQL statements, you need to locate the postgresql log file (e.g. /var/log/postgresql/postgresql.log) and use it as input for the perl script below:

I have created syslog-filter, a simple perl script you may run from the command line, like so:

./syslog-filter postgresql.log  'complex search 001' > complex-search-001.log

… assuming the script has permission to be executed and is located in the same directory as the postgresql.log file. This command creates complex-search-001.log, which contains only those SQL statement that belong to this use-case.

Here is the code for syslog-filter:

#!/usr/bin/perl -w
if(scalar(@ARGV) < 2) {
   print "Usage: ./syslog-filter <file> <token>\ne.g. ./syslog-filter scenario.log 'Quoted companies'\n"; exit(1);
}
open(MYFILE, '<'.$ARGV[0]) or die "Can't open ".$ARGV[0]." for reading...\n";
my $switch = 0; my $line = "";
while($line = <MYFILE>) {
    if($line =~ /$ARGV[1]/) { &toggle_switch(); }
    print $line if $switch;
}
close(MYFILE);

sub toggle_switch { if($switch) { $switch=0; } else { $switch=1; } }

For the next step, you may want to use the following script, syslog-to-tsung-xml:

#!/usr/bin/perl -w
use Parse::Syslog;
if(scalar(@ARGV) < 1) {
   print "Usage: ./syslog-to-tsung-xml <logfile>\ne.g. ./syslog-to-tsung-xml my-scenario.log\n"; exit(1);
}
my $parser = Parse::Syslog->new( $ARGV[0] ); $s = 0; # $s is just a switch whether we should record/not
READINGLOOP: while(my $sl = $parser->next) {
   $line = $sl->{text}; # i don't want to write $sl->{text} all the time🙂
   if ($line =~ /LOG:  execute/ or $line =~ /LOG:  statement/) { # if we see 'LOG:  execute' we know we should start recording...
      # but if the recording switch is already on, we need to save recorded statement into @selects
      if($s and $st ne "") { push @selects, $st; $s = 0; $st = ""; $g = undef; }
      # in other wordsd, a 'LOG:  execute' also means previous recording should end
      if($line =~ /\[(.+)-.+(SELECT .+)$/) { $s = 1; $g = $1; $st = $2; } # regular expression heaven - wish
      # if this is a SELECT statement it is put in $st, $s is set to 1, $g contains id filtering next lines
      next READINGLOOP; # ok, let's proceed with the next line - don't execute the rest...
   }
   if ($s and $line =~ /\[(.+)-.+\] (.+)$/ and $g == $1) { $st .= $2; } # recording subsequent lines - concat
}
# just to be sure, we save whatever is inside $st once we reach the end of the file - no more 'LOG:  execute's
if($st ne "") { push @selects, $st; $s = 0; $st = ""; $g = undef; }
# now, we should scan the results for 'DETAIL:  parameters:' and perform all the described substitutions
my $array; my $hash; my $key; my $val; my $var; my $target; my $subs;
for($i=0;$i<scalar(@selects);$i++) {
   if ($selects[$i] =~ /^(.+)DETAIL:  parameters: (.+)$/) {
      # reading parameters, splitting them into key,val pairs for subsequent search and replace
      $array = (); $hash = {}; $subs = "";
      $target = $1;
      @$array = split ',' , $2;
      # print "\nBefore: ----------------------------------------------------------------------------------\n";
      # print $target, "\n";
      # print "------------------------------------------------------------------------------------------\n";
      foreach $var (@$array) {
         ($key,$val) = split '=', $var;
         $key =~ s/^ *(.+) +$/$1/;
         $val =~ s/^ *'(.+)' *$/$1/;
         $hash->{$key} = $val;
         # print $key, "\t", $val, "\n";
         $subs = "\\".$key;
         $target =~ s/$subs\:\:/\'$val\'::/g;
      }
      # print "After: ----------------------------------------------------------------------------------\n";
      # print $target, "\n";
      # print "------------------------------------------------------------------------------------------\n";
      $selects[$i] = $target;
   }
}
# and on to outputting our results...
# pure sql output if there is a second argument in the command line
if($ARGV[1]) { for($i=0;$i<scalar(@selects);$i++) { print $selects[$i],";\n"; } }
else {
# tsung compatible output
print <<STARTOFSESSION;
    <session name="$ARGV[0]" probability="100" type="ts_pgsql">
        <transaction name="connection">
            <request>
                <pgsql type="connect" database="mydatabase" username="myusername" />
            </request>
            <request>
                <pgsql type="authenticate" password="mypassword"/>
            </request>
        </transaction>
        <thinktime value="5"/>
            <transaction name="requests"> <!-- start of requests -->
STARTOFSESSION
for($i=0;$i<scalar(@selects);$i++) {
   print "\t\t\t\t<request><pgsql type=\"sql\"><![CDATA["; print $selects[$i],"\n"; print "]]></pgsql></request>\n"
}
print <<ENDOFSESSION;
            </transaction> <!-- end of requests -->
            <thinktime value="5"/> <!-- delay between scenario re-play -->
        <request><pgsql type="close"></pgsql></request>
    </session>
ENDOFSESSION
}

This is how you would run the above script:

./syslog-to-tsung-xml complex-search-001.log > complex-search-001.xml

This generates a partial Tsung file in the proper format. This process need to be repeated for every different use-case we would like to include. The resulting xml files may be concatenated into a single file, like so:

cat *.xml > my-tsung-scenario.xml

The resulting file (my-tsung-scenario.xml) will be completed into a full valid Tsung scenario file in section 2.4 In order to run the above scripts, you obviously need a working Perl environment and the Parse::Syslog perl module, which may be installed by typing (as root):

cpan Parse::Syslog

Before proceeding any further, you may want to manually edit all occurences of

<transaction name="requests">

…in my-tsung-scenario.xml, changing the name each time to reflect the use-case which follows. E.g.

<transaction name="complexSearch1">

Another required manual edit concerns the probability factors assigned to each use-case (session). Therefore, you need to adjust the probability settings of all such occurences:

 <session name="complex-search-001.log" probability="100" type="ts_pgsql">

… to reflect the desired frequency of each use-case in the tests. Changing 100 to 25 in the above line will force 1 in 4 users during the Tsung tests to replay the ‘complex-search-001’ scenario.

To turn a series of sessions described in the file my-tsung-scenario.xml into a full, valid scenario we need to type:

echo '<!DOCTYPE tsung SYSTEM "/usr/local/share/tsung/tsung-1.0.dtd" [] >

<tsung>
<!- <tsung loglevel="debug" dumptraffic="true"> -> <!- useful sometimes ->
   <clients>
      <client host="myclient" weight="1" cpu="2"></client>
   </clients>

   <servers>
      <server host="myserver" port="5432" type="tcp"/>
   </servers>

   <monitoring>
      <monitor host="myserver" type="erlang"></monitor> <!- postgresql server ->
      <monitor host="myclient" type="erlang"></monitor>
   </monitoring>

   <load>
      <arrivalphase phase="1" duration="1800" unit="second">
         <users interarrival="4" unit="second"></users>
      </arrivalphase>
      <arrivalphase phase="2" duration="1800" unit="second">
         <users interarrival="2" unit="second"></users>
      </arrivalphase>
   </load>

   <sessions>

' >  head-tsung-scenario.xml

… to get a head-tsung-scenario.xml file which we can then edit accoring to our needs. If we keep the existing settings, Tsung will attempt to load-test a server called myserver (the names needs to be resolvable, please check your DNS service and/or your /etc/hosts file) from a single client, myclient, while trying to monitor hardware load on both machines. In the load section, two load phases have been defined, starting at “new user every 4 seconds” and then doubling the rate. Each of these phases is meant to last half an hour (1800s), but once the server reaches its breaking point, user sessions do not terminate properly and the duration of the load phase we are in is expanded, as Tsung waits for all users to finish before proceeding to the next one. Once you have changed head-tsung-scenario.xml according to your needs, you may complete the generation of a new scenario file by typing:

 cat head-tsung-scenario.xml my-tsung-scenario.xml > full-tsung-scenario.xml; echo '
    </sessions>
</tsung>
' >> temp-tsung-scenario.xml

This file (temp-tsung-scenario.xml) is actually a full valid scenario file which may be used for testing. But you probably want to tweak one or two things to make this testing relevant to your system, which is what we shall discuss in the next installment of this tutorial.

Follow

Get every new post delivered to your Inbox.