Quick Links

Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org, Michael Clemmons <glassresistor(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Subject:	Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Date:	2010-01-29 18:56:23
Message-ID:	407d949e1001291056q22915b1cqbce5fbc918a15d69@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Tue, Jan 19, 2010 at 3:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> That function *seriously* needs documentation, in particular the fact
> that it's a no-op on machines without the right kernel call. The name
> you've chosen is very bad for those semantics. I'd pick something
> else myself. Maybe "pg_start_data_flush" or something like that?
>

I would like to make one token argument in favour of the name I
picked. If it doesn't convince I'll change it since we can always
revisit the API down the road.

I envision having two function calls, pg_fsync_start() and
pg_fsync_finish(). The latter will wait until the data synced in the
first call is actually synced. The fall-back if there's no
implementation of this would be for fsync_start() to be a noop (or
something unreliable like posix_fadvise) and fsync_finish() to just be
a regular fsync.

I think we can accomplish this with sync_file_range() but I need to
read up on how it actually works a bit more. In this case it doesn't
make a difference since when we call fsync_finish() it's going to be
for the entire file and nothing else will have been writing to these
files. But for wal writing and checkpointing it might have very
different performance characteristics.

The big objection to this is that then we don't really have an api for
FADV_DONT_NEED which is more about cache policy than about syncing to
disk. So for example a sequential scan might want to indicate that it
isn't planning on reading the buffers it's churning through but
doesn't want to force them to be written sooner than otherwise and is
never going to call fsync_finish().

--
greg

In response to

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb) at 2010-01-19 15:25:46 from Tom Lane

Responses

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb) at 2010-02-02 17:36:12 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2010-01-29 19:04:01	Re: [CFReview] Red-Black Tree
Previous Message	Oleg Bartunov	2010-01-29 18:52:07	Re: [CFReview] Red-Black Treey

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Smith	2010-01-29 22:19:41	Re: Limited Shared Buffer Problem
Previous Message	**Rod MacNeil	2010-01-29 18:36:12	Re: Limited Shared Buffer Problem