From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fallocate / posix_fallocate for new WAL file creation (etc...) |
Date: | 2013-05-28 15:21:05 |
Message-ID: | 20130528152105.GB16637@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-05-28 10:12:05 -0500, Jon Nelson wrote:
> On Tue, May 28, 2013 at 9:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Tue, May 28, 2013 at 10:15 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
> >>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> >>> >> The biggest thing missing from this submission is information about what
> >>> >> performance testing you did. Ideally performance patches are submitted with
> >>> >> enough information for a reviewer to duplicate the same test the author did,
> >>> >> as well as hard before/after performance numbers from your test system. It
> >>> >> often turns tricky to duplicate a performance gain, and being able to run
> >>> >> the same test used for initial development eliminates a lot of the problems.
> >>> >
> >>> > This has been a bit of a struggle. While it's true that WAL file
> >>> > creation doesn't happen with great frequency, and while it's also true
> >>> > that - with strace and other tests - it can be proven that
> >>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
> >>> > proving that in the larger context of a running install has been
> >>> > challenging.
> >>>
> >>> It's nice to be able to test things in the context of a running
> >>> install, but sometimes a microbenchmark is just as good. I mean, if
> >>> posix_fallocate() is faster, then it's just faster, right?
> >>
> >> Well, it's a bit more complex than that. Fallocate doesn't actually
> >> initializes the disk space in most filesystems, just marks it as
> >> allocated and zeroed which is one of the reasons it can be noticeably
> >> faster. But that can make the runtime overhead of writing to those pages
> >> higher.
> >
> > Maybe it would be good to measure that impact. Something like this:
> >
> > 1. Write 16MB of zeroes to an empty file in the same size chunks we're
> > currently using (8kB?). Time that. Rewrite the file with real data.
> > Time that.
> > 2. posix_fallocate() an empty file out to 16MB. Time that. Rewrite
> > the fie with real data. Time that.
> >
> > Personally, I have trouble believing that writing 16MB of zeroes by
> > hand is "better" than telling the OS to do it for us. If that's so,
> > the OS is just stupid, because it ought to be able to optimize the
> > crap out of that compared to anything we can do. Of course, it is
> > more than possible that the OS is in fact stupid. But I'd like to
> > hope not.
>
> I wrote a little C program to do something very similar to that (which
> I'll hope to post later today).
> It opens a new file, fallocates 16MB, calls fdatasync. Then it loops
> 10 times: seek to the start of the file, writes 16MB of ones, calls
> fdatasync.
You need to call fsync() not fdatasync() the first time round. fdatasync
doesn't guarantee metadata is synced.
> Then it closes and removes the file, re-opens it, and this time writes
> out 16MB of zeroes, calls fdatasync, and then does the same loop as
> above. The program times the process from file open to file unlink,
> inclusive.
>
> The results - for me - are pretty consistent: using fallocate is
> 12-13% quicker than writing out zeroes.
Cool!
> I used fdatasync twice to (attempt) to mimic what the WAL writer does.
Not sure what you mean by that though?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2013-05-28 15:36:16 | Re: Planning incompatibilities for Postgres 10.0 |
Previous Message | Merlin Moncure | 2013-05-28 15:15:17 | Re: PostgreSQL Process memory architecture |