Re: Sorted writes in checkpoint

From: "Gregory Maxwell" <gmaxwell(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Subject: Re: Sorted writes in checkpoint
Date: 2007-06-15 02:37:14
Message-ID: e692861c0706141937p47212c5y4ac6177ecd086430@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 6/14/07, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Thu, 2007-06-14 at 16:39 +0900, ITAGAKI Takahiro wrote:
> > Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> >
> > > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:
> > > > If the kernel can treat sequential writes better than random writes, is
> > > > it worth sorting dirty buffers in block order per file at the start of
> > > > checkpoints?
> >
> > I wrote and tested the attached sorted-writes patch base on Heikki's
> > ldc-justwrites-1.patch. There was obvious performance win on OLTP workload.
> >
> > tests | pgbench | DBT-2 response time (avg/90%/max)
> > ---------------------------+---------+-----------------------------------
> > LDC only | 181 tps | 1.12 / 4.38 / 12.13 s
> > + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s
> > + Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s
> >
> > (*) Don't write buffers that were dirtied after starting the checkpoint.
> >
> > machine : 2GB-ram, SCSI*4 RAID-5
> > pgbench : -s400 -t40000 -c10 (about 5GB of database)
> > DBT-2 : 60WH (about 6GB of database)
>
> I'm very surprised by the BM_CHECKPOINT_NEEDED results. What percentage
> of writes has been saved by doing that? We would expect a small
> percentage of blocks only and so that shouldn't make a significant
> difference. I thought we discussed this before, about a year ago. It
> would be easy to get that wrong and to avoid writing a block that had
> been re-dirtied after the start of checkpoint, but was already dirty
> beforehand. How long was the write phase of the checkpoint, how long
> between checkpoints?
>
> I can see the sorted writes having an effect because the OS may not
> receive blocks within a sufficient time window to fully optimise them.
> That effect would grow with increasing sizes of shared_buffers and
> decrease with size of controller cache. How big was the shared buffers
> setting? What OS scheduler are you using? The effect would be greatest
> when using Deadline.

Linux has some instrumentation that might be useful for this testing,

echo 1 > /proc/sys/vm/block_dump
Will have the kernel log all physical IO (disable syslog writing to
disk before turning it on if you don't want the system to blow up).

Certainly the OS elevator should be working well enough to not see
that much of an improvement. Perhaps frequent fsync behavior is having
unintended interaction with the elevator? ... It might be worthwhile
to contact some Linux kernel developers and see if there is some
misunderstanding.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-06-15 02:47:55 Re: Tsearch vs Snowball, or what's a source file?
Previous Message mark 2007-06-15 00:04:10 Re: Change sort order on UUIDs?

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Smith 2007-06-15 04:53:41 Re: Sorted writes in checkpoint
Previous Message Alexey Klyukin 2007-06-14 19:19:08 Re: Silly bug in pgbench's random number generator