Re: fsync-pgdata-on-recovery tries to write to more files than previously

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date: 2015-05-26 17:07:20
Message-ID: 20150526170720.GB5310@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-05-26 10:41:12 -0400, Tom Lane wrote:
> Yeah. Perhaps I missed it, but was the original patch motivated by
> actual failures that had been seen in the field, or was it just a
> hypothetical concern?

I'd mentioned that it might be a good idea to do this while
investingating a bug with unlogged tables that several parties had
reported. That patch has been fixed in a more granular fashion. From
there it took on kind of a life on its own.

It is somewhat interesting that similar code has been used in
pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
causing reported problem. I think the relevant difference is that that
code doesn't follow symlinks. It's obviously also less exercised and
poeople might just have fixed up permissions when encountering troubles.

Abhijit, do you recall why the code was changed to follow all symlinks
in contrast to explicitly going through the tablespaces as initdb -S
does? I'm pretty sure early versions of the patch pretty much had a
verbatim copy of the initdb logic? That logic is missing pg_xlog btw,
which is bad for pg_upgrade.

I *can* reproduce corrupted clusters without this without trying too
hard. I yesterday wrote a test for the crash testing infrastructure I've
on and off worked on (since 2011. Uhm) and I could reproduce a bunch of
corrupted clusters without this patch. When randomly triggering crash
restarts shortly afterwards follwed by a simulated hw restart (stop
accepting writes via linux device mapper) while concurrent COPYs are
running, I can trigger a bunch of data corruptions.

Since then my computer in berlin has done 440 testruns with the patch,
and 450 without. I've gotten 7 errors without, 0 with. But the
instrumentation for detecting errors is really shitty (pkey lookup for
every expected row) and doesn't yet keep meaningful diagnosistics. So
this isn't particularly bulletproof either way.

I can't tell whether the patch is just masking yet another problem due
to different timings caused by the fsync, or whether it's fixing the
problem that we can forget to sync WAL segments.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-05-26 17:13:40 Re: fsync-pgdata-on-recovery tries to write to more files than previously
Previous Message Tom Lane 2015-05-26 16:31:48 Re: fsync-pgdata-on-recovery tries to write to more files than previously