From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, 蔡梦娟(玊于) <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> |
Subject: | Re: prevent immature WAL streaming |
Date: | 2021-08-31 13:56:30 |
Message-ID: | 202108311356.sl33wcpcz5x6@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021-Aug-30, Andres Freund wrote:
> I'm doubtful that the approach of adding awareness of record boundaries
> is a good path to go down:
Honestly, I do not like it one bit and if I can avoid relying on them
while making the whole thing work correctly, I am happy. Clearly it
wasn't a problem for the ancient recovery-only WAL design, but as soon
as we added replication on top the whole issue of continuation records
became a bug.
I do think that the code should be first correct and second performant,
though.
> - There are very similar issues with promotions of replicas (consider
> what happens if we need to promote with the end of local WAL spanning
> a segment boundary, and what happens to cascading replicas). We have
> some logic to try to deal with that, but it's pretty grotty and I
> think incomplete.
Ouch, I hadn't thought of cascading replicas.
> - It seems to make some future optimizations harder - we should work
> towards replicating data sooner, rather than the opposite. Right now
> that's a major bottleneck around syncrep.
Absolutely.
> I think a better approach might be to handle this on the WAL layout
> level. What if we never overwrite partial records but instead just
> skipped over them during decoding?
Maybe this is a workable approach, let's work it out fully.
Let me see if I understand what you mean:
* We would remove the logic to inhibit archiving and streaming-
replicating the tail end of a split WAL record; that logic deals with
bytes only, so doesn't have to be aware of record boundaries.
* On WAL replay, we ignore records that are split across a segment
boundary and whose checksum does not match.
* On WAL write ... ?
How do we detect after recovery that a record that was being written,
and potentially was sent to the archive, needs to be "skipped"?
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2021-08-31 14:05:44 | Re: Added schema level support for publication. |
Previous Message | Andrew Dunstan | 2021-08-31 13:16:07 | Re: pgsql: Avoid using ambiguous word "positive" in error message. |