From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | sawada(dot)mshk(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, michael(at)paquier(dot)xyz, jkatz(at)postgresql(dot)org, jcasanov(at)systemguards(dot)com(dot)ec, pgsql-hackers(at)postgresql(dot)org, john(dot)naylor(at)enterprisedb(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: START_REPLICATION SLOT causing a crash in an assert build |
Date: | 2022-10-08 02:56:33 |
Message-ID: | 20221008025633.5njv7ufypkggsbuf@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-10-07 12:00:56 -0700, Andres Freund wrote:
> On 2022-10-07 15:30:43 +0900, Kyotaro Horiguchi wrote:
> > The key point of this is this:
> >
> > + * XXX: I think there cannot actually be data from an older slot
> > + * here. After a crash we throw away the old stats data and if a slot is
> > + * dropped while shut down, we'll not load the slot data at startup.
> >
> > I think this is true. Assuming that we don't recreate or rename
> > objects that have stats after writing out stats, we won't have stats
> > for a different object with the same name.
>
> Thanks for thinking through this!
> > If we can rely on that fact, the existing check in pgstat_acquire_replslot()
> > becomes useless. Thus we don't need to store object name in stats entry. I
> > agree to that.
>
> I don't agree with this aspect. I think it's better to ensure that the stats
> object exists when acquiring a slot, rather than later, when reporting. It's a
> lot simpler to think through the lock nesting etc that way.
>
> I'd also eventually like to remove the stats that are currently kept
> separately in ReorderBuffer, and that will be easier/cheaper if we can rely on
> the stats objects to have already been created.
Here's a cleaned up version of my earlier prototype.
- I wrapped the access to ReplicationSlotCtl->replication_slots[i].data.name
in a new function bool ReplicationSlotName(index, name). I'm not entirely
happy with that name, as it sounds like a more general accessor than it is -
I toyed with ReplicationSlotNameForIndex(), but that seemed somewhat
unnecessary, I don't forsee a need for another name accessor.
Anyone wants to weigh in?
- Substantial comment and commit message polishing
- I'm planning to drop PgStat_StatReplSlotEntry.slotname and a
PGSTAT_FILE_FORMAT_ID bump in master and to rename slotname to
slotname_unused in 15.
- Self-review found a bug, I copy-pasted create=false in the call to
pgstat_get_entry_ref() in pgstat_acquire_replslot(). This did *NOT* cause
any test failures - clearly our test coverage in this area is woefully
inadequate.
- This patch does not contain a test for the fix. I think this can only be
tested by a tap test starting pg_recvlogical in the background and checking
pg_recvlogical's output. That type of test is notoriously hard to be
reliable, so committing it shortly before the release is wrapped seems like
a bad idea.
I manually verified that:
- a continually active slot reporting stats after pgstat_reset_replslot()
works correctly (this is what crashed before)
- replslot stats reporting works correctly after stats were removed
- upgrading from pre-fix to post-fix preserves stats (when keeping slotname /
not increasing the stats version, of course)
I'm planning to push this either later tonight (if I feel up to it after
cooking dinner) or tomorrow morning PST, due to the release wrap deadline.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
v4-0001-pgstat-Prevent-stats-reset-from-corrupting-slotna.patch | text/x-diff | 10.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-10-08 02:58:06 | Re: Difference between HeapTupleData and TupleTableSlot structures |
Previous Message | Ajay P S | 2022-10-08 02:26:21 | Difference between HeapTupleData and TupleTableSlot structures |