From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "berge(at)trivini(dot)no" <berge(at)trivini(dot)no>, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr> |
Subject: | Re: Standby trying "restore_command" before local WAL |
Date: | 2018-08-08 14:06:44 |
Message-ID: | 010562f7-b0f4-d41e-2343-e93b72f1f4d6@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/7/18 11:42 AM, Stephen Frost wrote:
>
>>> CRC's are per WAL record, and possibly some WAL records might not be ok
>>> to replay, or at least we need to make sure that we replay the right set
>>> of WAL in the right order even when there are partial WAL files being
>>> given to PG (that aren't named that way...). The more I think about
>>> this, I think we really need to avoid partial WAL files entirely- what
>>> are we going to do when we get to the end of one? We'd need to request
>>> the full one from the restore command anyway, so seems like we should
>>> just go ahead and get it from the archive, the question is if there's an
>>> easy/cheap way to detect partial WAL files in pg_wal.
>>
>> As explained above, I don't think this is actually a problem. The checksums
>> do cover the whole file thanks to chaining, and there are ways to detect
>> partial segments. IMHO it's fine if we replay a segment and then find out it
>> was partial and that we need to fetch it from archive anyway and re-apply it
>> - it should not be very common case, except when the user does something
>> silly.
>
> As long as we *do* go off and try to fetch that WAL file and replay it,
> and don't assume that the end of that partial WAL file means the end of
> WAL replay, then I think you may be right and that it'd be fine, but it
> does seem a bit risky to me.
This assumes that the local partial is a subset of the archived full WAL
segment, which should be true in most cases but I don't think we can
discount the possibility that it isn't. Split-brain is certainly a way
to get to differing partials, though in that case things are already
pretty bad.
I've seen some pretty messed up situations and usually it is best to
treat the WAL archive as the ground truth. If the archive_command is
smart enough not to overwrite WAL segments that already exist with
different versions then it should be a reliable record that all servers
can be replayed from (split-brains aside). I think it's best to treat
the local WAL with some suspicion unless it is known to be good, i.e.
just restored from archive.
I do agree that most inconsistencies could be detected and throw an
error, but only if the WAL in the repository is examined, which means
making a round-trip there anyway.
At the very least, it seems that simple enabling "read from pg_wal
first" is not a good idea without making other changes to ensure it is
done correctly.
Regards,
--
-David
david(at)pgmasters(dot)net
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2018-08-08 14:08:49 | Re: Standby trying "restore_command" before local WAL |
Previous Message | Tom Lane | 2018-08-08 14:04:12 | Re: pgsql: Fix run-time partition pruning for appends with multiple source |