Quick Links

Re: some longer, larger pgbench tests with various performance-related patches

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: some longer, larger pgbench tests with various performance-related patches
Date:	2012-02-06 14:38:23
Message-ID:	CA+Tgmob1L+1ROcUX46us9mFcvBuT58UxDq0NZ3+HQWk=QGr-6A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Feb 4, 2012 at 2:13 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> We really need to nail that down. Could you post the scripts (on the
> wiki) you use for running the benchmark and making the graph? I'd
> like to see how much work it would be for me to change it to detect
> checkpoints and do something like color the markers blue during
> checkpoints and red elsewhen.

They're pretty crude - I've attached them here.

> Also, I'm not sure how bad that graph really is. The overall
> throughput is more variable, and there are a few latency spikes but
> they are few. The dominant feature is simply that the long-term
> average is less than the initial burst.Of course the goal is to have
> a high level of throughput with a smooth latency under sustained
> conditions. But to expect that that long-sustained smooth level of
> throughput be identical to the "initial burst throughput" sounds like
> more of a fantasy than a goal.

That's probably true, but the drop-off is currently quite extreme.
The fact that disabling full_page_writes causes throughput to increase
by >4x is dismaying, at least to me.

> If we want to accept the lowered
> throughput and work on the what variability/spikes are there, I think
> a good approach would be to take the long term TPS average, and dial
> the number of clients back until the initial burst TPS matches that
> long term average. Then see if the spikes still exist over the long
> term using that dialed back number of clients.

Hmm, I might be able to do that.

> I don't think the full-page-writes are leading to WALInsert
> contention, for example, because that would probably lead to smooth
> throughput decline, but not those latency spikes in which those entire
> seconds go by without transactions.

Right.

> I doubt it is leading to general
> IO compaction, as the IO at that point should be pretty much
> sequential (the checkpoint has not yet reached the sync stage, the WAL
> is sequential). So I bet that that is caused by fsyncs occurring at
> xlog segment switches, and the locking that that entails.

That's definitely possible.

> If I
> recall, we can have a segment which is completely written to OS and in
> the process of being fsynced, and we can have another segment which is
> in some state of partially in wal_buffers and partly written out to OS
> cache, but that we can't start reusing the wal_buffers that were
> already written to OS for that segment (and therefore are
> theoretically available for reuse by the upcoming 3rd segment) until
> the previous segments fsync has completed. So all WALInsert's freeze.
> Or something like that. This code has changed a bit since last time
> I studied it.

Yeah, I need to better-characterize where the pauses are coming from,
but I'm reluctant to invest too much effort in until Heikki's xlog
scaling patch goes in, because I think that's going to change things
enough that any work done now will mostly be wasted.

It might be worth trying a run with wal_buffers=32MB or something like
that, just to see whether that mitigates any of the locking pile-ups.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
runtestl	application/octet-stream	1.1 KB
makeplot	application/octet-stream	1.6 KB

In response to

Re: some longer, larger pgbench tests with various performance-related patches at 2012-02-04 19:13:45 from Jeff Janes

Responses

Re: some longer, larger pgbench tests with various performance-related patches at 2012-02-12 01:02:11 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2012-02-06 15:02:48	Re: Dry-run mode for pg_archivecleanup
Previous Message	Alvaro Herrera	2012-02-06 14:31:20	Re: freezing multixacts