From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Smith <gsmith(at)gregsmith(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: Sorting writes during checkpoint |
Date: | 2008-07-04 08:37:10 |
Message-ID: | 1215160630.4051.19.camel@ebony.site |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On Mon, 2008-05-05 at 00:23 -0400, Tom Lane wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > On Sun, 4 May 2008, Tom Lane wrote:
> >> Well, I tried a pgbench test similar to that one --- on smaller hardware
> >> than was reported, so it was a bit smaller test case, but it should have
> >> given similar results.
>
> > ... If
> > you're not offloading to another device like that, the OS-level elevator
> > sorting will handle sorting for you close enough to optimally that I doubt
> > this will help much (and in fact may just get in the way).
>
> Yeah. It bothers me a bit that the patch forces writes to be done "all
> of file A in order, then all of file B in order, etc". We don't know
> enough about the disk layout of the files to be sure that that's good.
> (This might also mean that whether there is a win is going to be
> platform and filesystem dependent ...)
No action on this seen since last commitfest, but I think we should do
something with it, rather than just ignore it.
Agree with all comments myself, so proposed solution is to implement
this as an I/O elevator hook. Standard elevator is to issue them in
order as they come, additional elevator in contrib is file/block sorted.
That will make testing easier and will also give Itagaki his benefit,
while allowing on-going research. If this solution's good enough for
Linux it ought to be good enough for us.
Note that if we do this for checkpoint we should also do this for
FlushRelationBuffers(), used during heap_sync(), for exactly the same
reasons.
Would suggest calling it bulk_io_hook() or similar.
Further observation would be that if there was an effect then it would
be at the block-device level, i.e. tablespace. Sorting the writes so
that we issued one tablespace at a time might at least help the I/O
elevators/disk caches to work with the whole problem at once. We might
get benefit on one tablespace but not on another.
Sorting by file might have inadvertently shown benefit at the tablespace
level on a larger server with spread out data whereas on Tom's test
system I would guess just a single tablespace was used.
Anyway, I note that we don't have an easy way of sorting by tablespace,
but I'm sure it would be possible to look up the tablespace for a file
within a plugin.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2008-07-04 08:41:26 | Re: [PATCHES] Multi-column GIN |
Previous Message | Heikki Linnakangas | 2008-07-04 08:26:01 | Multi-column GIN |
From | Date | Subject | |
---|---|---|---|
Next Message | Zdenek Kotala | 2008-07-04 08:38:04 | Re: page macros cleanup |
Previous Message | Heikki Linnakangas | 2008-07-04 08:26:01 | Multi-column GIN |