Re: Limit of bgwriter_lru_maxpages of max. 1000?

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Gerhard Wiesinger <lists(at)wiesinger(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Limit of bgwriter_lru_maxpages of max. 1000?
Date: 2009-10-05 20:40:44
Message-ID: alpine.GSO.2.01.0910051500490.9269@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 5 Oct 2009, Gerhard Wiesinger wrote:

> I think the problem is, that it is done on checkpoint time (whether
> spread or not). I should have been already be done by bgwriter.

This is pretty simple: if you write things before checkpoint time, you'll
end up re-writing a percentage of the blocks if they're re-dirtied before
the checkpoint actually happens. The checkpoint itself is always the most
efficient time to write something out. People think that the background
writer should do more, but it can't without generating more writes than if
you instead focused on spreading the checkpoints out instead. This is why
the only work the BGW does try to do is writing out blocks that it's
pretty sure are going to be evicted very soon (in the next 200ms, or
whatever its cycle time is set to), to minimize the potential for
mistakes. The design errors a bit on the side of doing too little because
it is paranoid about not doing wasted work, and that implementation always
beat one where the background writer was more aggressive in benchmarks.

This is hard for people to accept, but there were three of us running
independent tests to improve things here by the end of 8.3 development and
everybody saw similar results as far as the checkpoint spreading approach
being the right one. At the time the patch was labeled "load distributed
checkpoint" and if I had more time today I'd try and find the more
interesting parts of that discussion to highlight them.

> BTW: Is it possible to get everything in pg_class over all databases as
> admin?

Scott's message at
http://archives.postgresql.org/pgsql-general/2009-09/msg00986.php
summarizes the problem nicely, and I suggested my workaround for it at
http://archives.postgresql.org/pgsql-general/2009-09/msg00984.php

>>> Bug2: Double iteration of buffers
>>> As you can seen in the calling tree below there is double iteration with
>>> buffers involved. This might be a major performance bottleneck.
>>
>> Hmmm, this might be a real bug causing scans through the buffer cache to go
>> twice as fast as intended.
>
> That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop
> iteration) which means overall is O(n^2) which is IHMO too much.

I follow what you mean, didn't notice that. SyncOneBuffer isn't a O(n)
operation; it's O(1). So I'd think that the potential bug here turns into
a O(n) issue then given it's the routine being called n times.

This seems like a job for "dump things to the log file" style debugging.
If I can reproduce an actual bug here it sounds like a topic for the
hackers list outside of this discussion.

> The problem might be hidden for the following reasons:
> 1.) Buffers values are too low that even n^2 is low for today's machines
> 2.) Code is not often called in that way
> 3.) backend writes out pages so that the code is never executed

(2) was the reason I figured it might have escaped notice. It's really
not called that often in a way that would run into the problem you think
is there.

> Do you have an where one should set tracepoints inside and outside
> PostgreSQL?

I think you'd want to instrument BufferAlloc inside bufmgr.c to measure
what you're after.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2009-10-05 21:23:26 Re: How useful is the money datatype?
Previous Message Christophe Pettus 2009-10-05 20:07:16 Re: How useful is the money datatype?