Re: Pre-allocating WAL files

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pre-allocating WAL files
Date: 2025-01-22 15:56:33
Message-ID: Z5EVMXSWGN7_ViZ7@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2025 at 01:14:22AM +0000, Andy Fan wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> FWIW, I've seen the fsyncs around recycling being a rather substantial
>> bottleneck. To the point of the main benefit of larger segments being the
>> reduction in number of fsyncs at the end of a checkpoint. I think we should
>> be able to make the fsyncs a lot more efficient by batching them, first rename
>> a bunch of files, then fsync them and the directory. The current pattern
>> bascially requires a separate filesystem jouranl flush for each WAL segment.
>
> For education purpose, how to fsync files in batch? 'man fsync' tells me
> user can only fsync one file each time.
>
> int fsync(int fd);
>
> The fsync manual seems not saying fsync on a directory would fsync all
> the files under that directory.

I think Andres means that we should wait until the end of recycling to
fsync() the directory so that we aren't flushing it for every single
recycled segment. This sort of batching approach could also work well with
pre_sync_fname(), so that by the time we actually call fsync() on the
files, it has very little to do.

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2025-01-22 15:57:52 Converting pqsignal to void return
Previous Message Nathan Bossart 2025-01-22 15:50:59 Re: Pre-allocating WAL files