Re: checkpoint patches

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: checkpoint patches
Date: 2012-04-04 04:30:42
Message-ID: 4F7BCE72.5020801@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/25/2012 04:29 PM, Jim Nasby wrote:
> Another $0.02: I don't recall the community using pg_bench much at all
> to measure latency... I believe it's something fairly new. I point
> this out because I believe there are differences in analysis that you
> need to do for TPS vs latency. I think Robert's graphs support my
> argument; the numeric X-percentile data might not look terribly good,
> but reducing peak latency from 100ms to 60ms could be a really big
> deal on a lot of systems. My intuition is that one or both of these
> patches actually would be valuable in the real world; it would be a
> shame to throw them out because we're not sure how to performance test
> them...

One of these patches is already valuable in the real world. There it
will stay, while we continue mining it for nuggets of deeper insight
into the problem that can lead into a better test case.

Starting at pgbench latency worked out fairly well for some things.
Last year around this time I published some results I summarized at
http://blog.2ndquadrant.com/en/gregs-planetpostgresql/2011/02/ , which
included things like worst-case latency going from <=34 seconds on ext3
to <=5 seconds on xfs.

The problem I keep hitting now is that 2 to 5 second latencies on Linux
are extremely hard to get rid of if you overwhelm storage--any storage.
That's where the wall is, where if you try to drive them lower than that
you pay some hard trade-off penalties, if it works at all.

Take a look at the graph I've attached. That's a slow drive not able to
keep up with lots of random writes stalling, right? No. It's a
Fusion-io card that will do 600MB/s of random I/O. But clog it up with
an endless stream of pgbench writes, never with any pause to catch up,
and I can get Linux to clog it for many seconds whenever I set it loose.

This test workload is so not representative of the real world that I
don't think we should be committing things justified by it, unless they
are uncontested wins. And those aren't so easy to find on the write
side of things.

Thanks to Robert for shaking my poorly submitted patch and seeing what
happened. I threw mine out in hopes that some larger checkpoint patch
shoot-out might find it useful. Didn't happen, sorry I didn't get to
looking more at the other horses. I do have some more neat benchmarks
to share though

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
image/png 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-04-04 05:34:13 psql: tab completions for 'WITH'
Previous Message Greg Smith 2012-04-04 03:35:24 Re: performance-test farm