Skip navigation

Tag Archives: system engineering

The biggest hurdle I have had to overcome in order to use Tsung for load-testing Postgresql servers has been a conceptual mismatch between Tsung and what I wanted to do. Tsung’s model probably originates in the load-testing of web servers: everything is described in terms of user arrival rate, hits, pages, transactions, thinktimes. Database usage may not be readily described in these terms.

Before going any further, I should probably make clear I didn’t need Tsung to do performance testing. Performance testing may be easily done by throwing a specific set of SQL queries to the database server (in controlled conditions) and checking/timing the results (this could be a separate tutorial :-). Tsung gives you the tools to model proper user interaction and real-life usage and I have been trying to determine a server’s load capacity.

In other words, how many times our typical or target load could a particular server/set-up handle?

And this load had to be expressed in a Tsung-compatible xml file describing mainly:

  • alternative user sessions (with associated probabilities)
  • user arrival rate

Here’s a quick reminder of what Tsung transactions mean:

Different parts of a session may be grouped into transactions (Tsung-speak — nothing to do with your normal database transactions) for statistical monitoring of SQL groups. Transactions are characterised by their name, and names may be shared across sessions. This way, there are tremendous reporting possibilities, as all sessions may have a “connection” transaction offering global connection statistics, while transactions with unique names produce statistics on a specific use-case basis (e.g. complex data search, typical page load etc.).

For simplicity, I have opted to include only two “transactions” in each alternative user “session” (use-case):

  • a connection transaction (identified as “connection” in all “sessions”)
  • a SQL block transaction (with a unique, “session”-specific name)

Know your (target) usage

Here comes the obvious but imporant bit: you need to know your real-life or your target usage to proceed! Expressing your (target) usage into Tsung values is the only thing that binds your experiment to real-life and allows some conclusions to be drawn from the tests.

The defined “sessions”, should, of course, reflect your usage profile. This boils down to including a representative variety of use-cases, with the right probability factor assigned to each case.

But you also need to express the number of new “sessions” per second Tsung initiates against your system, i.e. the Tsung user arrival rate.

Adapting the scenario file

This is a quick summary of what you should edit in your Tsung scenario file to specify the desired load:

  • allocate different probabilities to your alternative “sessions” (do they add up to 100?)
  • make sure you wrap the important bits of each session into unique “transactions”
  • define appropriate user arrival rates in your “load phases”

Load phases are defined in this section of the Tsung scenario file:

      <arrivalphase phase="1" duration="1800" unit="second">
         <users interarrival="10" unit="second"></users>
      <arrivalphase phase="2" duration="1800" unit="second">
         <users interarrival="6" unit="second"></users>
      ... and so on...

Analyzing the results

Assuming have managed to run your tests, now comes the tricky part of interpreting your results. The Tsung helper perl script generates a multitude of graphs, but here’s a quick shortcut. The files which have been most useful to me are the following:

  • report.html
  • images/graphes-Transactions-max_sample.png
  • images/graphes-Transactions-mean.png
  • images/graphes-Users-simultaneous.png

When looking at these graphs, the two most important things to remember are the length (in seconds) of each load phase and what each phase represents. For example, the following graph (manually colored for convenience), may be divided into four sections, each representing a particular load phase (each phase lasted 1800 seconds, i.e. half an hour). This graph basically tells us things start to fall apart at 8x our target load.

simultaneous DB users

The reason the interpretation of this graph is easy is that we are not using any loops in each user “session”. Each Tsung “user” simply connects, sends a particular SQL block to the server, receives some results and exits. The user arrival rate stays constant throughout a particular load phase. Statistically speaking, if the server is responding properly, the number of new users in the system is always matched by the number of users exiting. Therefore, you only get simultaneous Tsung users if things start going wrong, when the server’s response times are increasing. And when you see the green and red lines splitting, things have gotten out of hand: Tsung is introducing new users which are not even able to connect!

We should always, of course, check, if the server’s performance was acceptable while it was “coping” with our load. In addition to the numbers in report.html, you could get the big picture by simply looking at images/graphes-Transactions-max_sample.png. The horizontal line for each “session” corresponds to the longest response time ever recorded for a particular use-case.


Armed with this knowledge, you may start experimenting further. Does your server recover from brief spikes of activity (e.g. long 4x phase, brief 16x phase, 4x phase etc.)? What effect do particular server configuration changes have on load capacity? And so on… This could easily turn into a full-time job 🙂