From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Creation of an empty table is not fsync'd at checkpoint |
Date: | 2022-01-27 20:41:04 |
Message-ID: | CA+hUKGK_k54szkL5ROjH5ubP8bG31SiHHg2jrkF-bS6X4JhesA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 28, 2022 at 8:17 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-01-28 08:01:01 +1300, Thomas Munro wrote:
> > It might be possible to avoid that on xfs or pretty much any other
> > file system. I wasn't following this closely, but even with ext4's
> > recent fast commit changes, its fsync implementation still
> > deliberately synchronises data for other file descriptors as a side
> > effect as summarised in [1], unlike xfs and other systems.
>
> Not data, just metadata, right? Well, and volatile write caches (by virtue of
> doing an otherwise unnecessary REQ_PREFLUSH). With data=writeback file
> contents for file a are not flushed to disk (rather than from disk write
> cache) when file b is fsynced. Before/After the fast commit feature.
Pass. [/me suppresses the urge to go and test]. I'm just going by
words in that article and T'so's linked email about file entanglement
and the global barrier nature of fsync. I'm not studying the code.
> > So they've caught up with xfs's concurrent writes (and gone further than xfs
> > by doing it also for buffered I/O giving up even page-level atomicity, as
> > discussed in a couple of other threads), but not yet decided to pull the
> > trigger on just-fsync-what-I-asked-for.
>
> I don't think the page level locking and the changes above / fsyncing are
> pretty much independent?
Right, the concurrency thing is completely unrelated and an old change
(maybe so old that I'm actually thinking of ext3?). I was just
commenting on two historical reasons why people used to switch to xfs
(especially other databases using DIO). I see that Noah's results
showed "atomicity" (for some definition) for read/write in 3.10, but
not for pread/pwrite, with the difference gone in 4.9, so I guess
perhaps somewhere between those releases is where our unlocked control
file access became problematic, but perhaps it wasn't fundamental,
just a quirk of the implementation of read/write with implicit
position, and if we'd used pread/pwrite for the control file we'd have
seen it sooner. *shrug*
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-01-27 20:47:08 | Re: A test for replay of regression tests |
Previous Message | Andrew Dunstan | 2022-01-27 20:27:17 | Re: A test for replay of regression tests |