From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Sergei Kornilov <sk(at)zsrv(dot)org> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date: | 2023-09-04 03:20:31 |
Message-ID: | CA+hUKG+jFN64gKVGcmfNTekqWn3cemRx99-B8DDBaDyzWnfpkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Thanks for looking/testing, Sergei. Thanks for the changes, Michael,
these all look good. I've squashed them and added you as co-author.
A couple more small comment/text changes:
1. In the place where we fail to allocate memory for an oversized
record, I copied the comment about treating that as a "bogus data"
condition. I suspect that we will soon be converting that to a FATAL
error[1], and that'll need to be done in both places.
2. In this version of the commit message I said we'd only back-patch
to 15 for now. After sleeping on this for a week, I realised that the
reason I keep vacillating on that point is that I am not sure what
your plan is for the malloc-failure-means-end-of-wal policy ([1],
ancient code from 0ffe11abd3a). If we're going to fix that in master
only but let sleeping dogs lie in the back-branches, then it becomes
less important to go back further than 15 with THIS patch. But if you
want to be able to distinguish garbage from out-of-memory, and thereby
end-of-wal from a FATAL please-insert-more-RAM condition, I think
you'd really need this industrial strength validation in all affected
branches, and I'd have more work to do, right? The weak validation we
are fixing here is the *real* underlying problem going back many
years, right?
I also wondered about strengthening the validation of various things
like redo begin/end LSNs etc in these tests. But we can always
continue to improve all this later...
Here also is a version for 15 (and a CI run[2]), since we tweaked many
of the error messages between 15 and 16.
[1] https://www.postgresql.org/message-id/flat/ZMh/WV%2BCuknqePQQ%40paquier.xyz
[2] https://cirrus-ci.com/task/4533280897236992 (failed on some
unrelated pgbench test, reported in another thread)
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Don-t-trust-unvalidated-xl_tot_len.backpatch15 | application/octet-stream | 24.0 KB |
v5-0001-Don-t-trust-unvalidated-xl_tot_len.patch | text/x-patch | 24.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-09-04 03:54:28 | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Previous Message | Richard Guo | 2023-09-04 03:01:41 | Re: BUG #18077: PostgreSQL server subprocess crashed by a SELECT statement with WITH clause |