From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date: | 2025-04-28 15:17:12 |
Message-ID: | CAPpHfdujSUM6de2ZD6gP_3vjUonHgiqh7StPZ5AQz-=4-qhuzA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> Thank you for the review. I apologize for a late reply. I missed your email.
>
> > 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
> > to advance the position of WAL needed by replication slots, the usage
> > pattern probably could be changed. Thus, we probably need to call
> > ReplicationSlotsComputeRequiredLSN() somewhere after change of
> > restart_lsn_flushed while restart_lsn is not changed. And probably
> > can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
> > restart_lsn is changed.
>
> Yes, it is a good idea for investigation, thank you! I guess, It may work for
> persistent slots but I'm not sure about other types of slots (ephemeral and
> temporary). I have no clear understanding of consequences at the moment. I
> propose to postpone it for future, because the proposed changes will me more
> invasive.
Yes, that's different for different types of slots. So, removing
ReplicationSlotsComputeRequiredLSN() doesn't look safe. But at least,
we need to analyze if we need to add extra calls.
> > 2) I think it's essential to include into the patch test caches which
> > fail without patch. You could start from integrating [1] test into
> > your patch, and then add more similar tests for different situations.
>
> The problem with TAP tests - it is hard to reproduce without injection points.
> The Tomas Vondra's tests create two new injection points. I have to add more
> injection points for new tests as well. Injection points help to test the code
> but make the code unreadable. I'm not sure, what is the policy of creating
> injection points? Personally, I would not like to add new injection points
> only to check this particular rare case. I'm trying to create a test without
> injection points that should fail occasionally, but I haven't succeeded at
> the moment.
I don't know if there is an explicit policy. I think we just add them
as needed to reproduce important situations in TAP tests. So, feel
free to them as many as you want to reproduce all the problematic
situations. During review we can find if they seem too many, but
don't bother about this at present stage.
> I have a question - is there any interest to backport the solution into
> existing major releases?
As long as this is the bug, it should be backpatched to all supported
affected releases.
> I can prepare a patch where restart_lsn_flushed stored
> outside of ReplicationSlot structure and doesn't affect the existing API.
Yes, please!
------
Regards,
Alexander Korotkov
Supabase
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-04-28 16:34:05 | Re: Fix premature xmin advancement during fast forward decoding |
Previous Message | Nathan Bossart | 2025-04-28 15:15:05 | Re: optimize file transfer in pg_upgrade |