Re: improving concurrent transactin commit rate

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improving concurrent transactin commit rate
Date: 2009-03-25 15:58:06
Message-ID: 20090325155805.GC12225@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2009 at 02:38:45PM +0000, Greg Stark wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> > Why does it top out so much though? It goes up nicely to around ten
> > clients (I tested with 8 and 12) and then tops out and levels off. The
> > log is chugging along at around 2MB/s which is well above where they
> > are for a single client, but it still seems as though things could go
> > further.
>
> Well 2MB/s sounds about right actually:
>
> You have: 8kB / ( 1|7200|2min)
> You want: MB/s
> * 1.92
> / 0.52083333

I'd need more explanation (or other pointers) to follow what you mean
there. I've actually got a 15k disk, but it shouldn't matter much.
2MB/s seems to be consistent across any number of clients (specifically
1 to 48 here).

> Heikki looked at this a while back and we concluded that the existing
> algorithm will only get 1/2 the optimal rate unless you have twice as many
> sessions as you ought to need to saturate the log i/o.

I'm writing to a 15k disk which gives me 250 rotations per second. In
the case of a single client I'm getting about 220 transactions per
second. That seems reasonable. When I have two clients this stays
at about 220 transactions per second. Also reasonable, they end up
serialising after each other.

Three clients; I get about 320 tps. This appears to be consistent with
1.5*220 and would imply that there's always a "spare" client behind the
lock that gets committed for free. Four clients; I get 430 tps which
would mean the queueing is all good.

Below I've calculated the (mean) transaction per second over a series
of runs and calculated the value I'd expect to get (i.e. clients/2) and
then the ratio of the two.

clients tps calc ratio
1 221.5
2 223.8 220.0 102%
3 323.5 330.0 98%
4 427.7 440.0 97%
6 647.4 660.0 98%
8 799.7 880.0 91%
12 946.0 1320.0 72%
18 1020.6 1980.0 52%
24 1089.2 2640.0 41%
32 1116.6 3520.0 32%
48 1141.8 5280.0 22%

As you can see the ratio between the tps I'm seeing and expecting drops
off significantly after 18 clients, with the trend starting somewhere
around seven clients. I don't understand why this would be happening.

My highly involved and complicated benchmarking is a shell script that
does:

#!/bin/bash
nclients=$1
ittrs=$2
function gensql {
echo "INSERT INTO bm (c,v) VALUES ('$1','0');"
for (( i = 1; i < $ittrs; i++ )); do
echo "UPDATE bm SET v = '$i' WHERE c = '$1';"
done
echo "DELETE FROM bm WHERE c = '$1';"
}
for (( c = 0; c < $nclients; c++)); do
gensql $c | psql -Xq -f - &
done
for (( c = 0; c < $nclients; c++)); do
wait
done

I'm running "time test.sh 8 1000" and recording the time; tps = nclients
* ittrs / time. Where the time is the "wall clock" time expired. I'm
repeating measurements four times and the "error bars" in my SVG from
before were the standard deviation of the runs.

Something (the HOT code?) keeps the number of dead tuples consistent so
I don't think this would be confounding things. But improvements would
be appreciated.

--
Sam http://samason.me.uk/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-03-25 16:27:24 Re: Review: B-Tree emulation for GIN
Previous Message Peter Willis 2009-03-25 15:50:28 Re: Proper entry of polygon type data