From: | "Tomas Vondra" <tv(at)fuzzy(dot)cz> |
---|---|
To: | "CSS" <css(at)morefoo(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Benchmarking tools, methods |
Date: | 2011-11-18 11:59:02 |
Message-ID: | dc7717a5b6903e520d70efd73a498e4a.squirrel@sq.gransy.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 18 Listopad 2011, 10:55, CSS wrote:
> Hello,
>
> I'm going to be testing some new hardware (see
> http://archives.postgresql.org/pgsql-performance/2011-11/msg00230.php) and
> while I've done some very rudimentary before/after tests with pgbench, I'm
> looking to pull more info than I have in the past, and I'd really like to
> automate things further.
>
> I'll be starting with basic disk benchmarks (bonnie++ and iozone) and then
> moving on to pgbench.
>
> I'm running FreeBSD and I'm interested in getting some baseline info on
> UFS2 single disk (SATA 7200/WD RE4), gmirror, zfs mirror, zfs raidz1, zfs
> set of two mirrors (ie: two mirrored vdevs in a mirror). Then I'm
> repeating that with the 4 Intel 320 SSDs, and just to satisfy my
> curiosity, a zfs mirror with two of the SSDs mirrored as the ZIL.
>
> Once that's narrowed down to a few practical choices, I'm moving on to
> pgbench. I've found some good info here regarding pgbench that is
> unfortunately a bit dated:
> http://www.westnet.com/~gsmith/content/postgresql/
>
> A few questions:
>
> -Any favorite automation or graphing tools beyond what's on Greg's site?
There are talks not listed on that westnet page - for example a recent
"Bottom-up Database Benchmarking" talk, available for example here:
http://pgbr.postgresql.org.br/2011/palestras.php?id=60
It probably contains more recent info about benchmarking tools and testing
new hardware.
> -Any detailed information on creating "custom" pgbench tests?
The technical info at
http://www.postgresql.org/docs/9.1/interactive/pgbench.html should be
sufficient I guess, it's fairly simple. The most difficult thing is
determining what the script should do - what queries to execute etc. And
that depends on the application.
> -Any other postgres benchmarking tools?
Not really. The pgbench is a nice stress testing tool and the scripting is
quite flexible. I've done some TPC-H-like testing recently, but it's
rather a bunch of scripts executed manually.
> I'm also curious about benchmarking using my own data. I tried something
> long ago that at least gave the illusion of working, but didn't seem quite
> right to me. I enabled basic query logging on one of our busier servers,
> dumped the db, and let it run for 24 hours. That gave me the normal
> random data from users throughout the day as well as our batch jobs that
> run overnight. I had to grep out and reformat the actual queries from the
> logfile, but that was not difficult. I then loaded the dump into the
> test server and basically fed the saved queries into it and timed the
> result. I also hacked together a script to sample cpu and disk stats
> every 2S and had that feeding into an rrd database so I could see how
> "busy" things were.
>
> In theory, this sounded good (to me), but I'm not sure I trust the
> results. Any suggestions on the general concept? Is it sound? Is there
> a better way to do it? I really like the idea of using (our) real data.
It's definitely a step in the right direction. An application-specific
benchmark is usually much more useful that a generic stress test. It
simply is going to tell you more about your workload and you can use it to
asses the capacity more precisely.
There are some issues though - mostly about transactions and locking. For
example if the client starts a transaction, locks a bunch of records and
then performs a time-consuming processing task outside the database, the
other clients may be locked. You won't see this during the stress test,
because in reality it looks like this
1) A: BEGIN
2) A: LOCK (table, row, ...)
3) A: perform something expensive
4) B: attempt to LOCK the same resource (blocks)
5) A: release the LOCK
6) B: obtains the LOCK and continues
but when replaying the workload, you'll see this
1) A: BEGIN
2) A: LOCK (table, row, ...)
3) B: attempt to LOCK the same resource (blocks)
4) A: release the LOCK
5) B: obtains the LOCK and continues
so B waits for a very short period of time (or not at all).
To identify this problem, you'd have to actually behave like the client.
For example with a web application, you could use apache bench
(https://httpd.apache.org/docs/2.0/programs/ab.html) or something like
that.
> Lastly, any general suggestions on tools to collect system data during
> tests and graph it are more than welcome. I can homebrew, but I'm sure
> I'd be reinventing the wheel.
System stats or database stats? There's a plenty of tools for system stats
(e.g. sar). For database stat's it's a bit more difficult - there's
pgwatch, pgstatspack and maybe some other tools (I've written pg_monitor).
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Cédric Villemain | 2011-11-18 12:44:30 | Re: Benchmarking tools, methods |
Previous Message | Greg Smith | 2011-11-18 10:09:20 | Re: SSD options, small database, ZFS |