Quick Links

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org, Michael Clemmons <glassresistor(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Subject:	Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Date:	2010-02-02 17:36:12
Message-ID:	603c8f071002020936k5723e30kd4eac594092aba3b@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Fri, Jan 29, 2010 at 1:56 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Tue, Jan 19, 2010 at 3:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> That function *seriously* needs documentation, in particular the fact
>> that it's a no-op on machines without the right kernel call. The name
>> you've chosen is very bad for those semantics. I'd pick something
>> else myself. Maybe "pg_start_data_flush" or something like that?
>>
>
> I would like to make one token argument in favour of the name I
> picked. If it doesn't convince I'll change it since we can always
> revisit the API down the road.
>
> I envision having two function calls, pg_fsync_start() and
> pg_fsync_finish(). The latter will wait until the data synced in the
> first call is actually synced. The fall-back if there's no
> implementation of this would be for fsync_start() to be a noop (or
> something unreliable like posix_fadvise) and fsync_finish() to just be
> a regular fsync.
>
> I think we can accomplish this with sync_file_range() but I need to
> read up on how it actually works a bit more. In this case it doesn't
> make a difference since when we call fsync_finish() it's going to be
> for the entire file and nothing else will have been writing to these
> files. But for wal writing and checkpointing it might have very
> different performance characteristics.
>
> The big objection to this is that then we don't really have an api for
> FADV_DONT_NEED which is more about cache policy than about syncing to
> disk. So for example a sequential scan might want to indicate that it
> isn't planning on reading the buffers it's churning through but
> doesn't want to force them to be written sooner than otherwise and is
> never going to call fsync_finish().

I took a look at this patch today and I agree with Tom that
pg_fsync_start() is a very confusing name. I don't know what the
right name is, but this doesn't fsync so I don't think it shuld have
fsync in the name. Maybe something like pg_advise_abandon() or
pg_abandon_cache(). The current name is really wishful thinking:
you're hoping that it will make the kernel start the fsync, but it
might not. I think pg_start_data_flush() is similarly optimistic.

...Robert

In response to

Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb) at 2010-01-29 18:56:23 from Greg Stark

Responses

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb) at 2010-02-02 17:43:15 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2010-02-02 17:43:15	Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Previous Message	Tom Lane	2010-02-02 16:57:14	Re: New VACUUM FULL crashes on temp relations

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andres Freund	2010-02-02 17:43:15	Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Previous Message	Scott Marlowe	2010-02-02 16:27:26	Re: the jokes for pg concurrency write performance