Re: BUG #17947: Combination of replslots pgstat issues causes error/assertion failure

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17947: Combination of replslots pgstat issues causes error/assertion failure
Date: 2023-05-31 23:42:45
Message-ID: 20230601.084245.1524588196504400333.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At Fri, 26 May 2023 11:00:01 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> The following bug has been logged on the website:
>
> Bug reference: 17947
> Logged by: Alexander Lakhin
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 15.3
> Operating system: Ubuntu 22.04
> Description:
>
> The following script:

It is reproduced here.

It looks like the function pgstat_get_entry_ref_cached returned a
faulty reference, which is directing us to a shared entry which is
already reinited for another replication slot. In the problem
scenario, the first backend successfully reuses the entry intended to
be dropped, which is pointed to by the cached entry, then the backend
re-drops it again. When the second backend obtains a cached entry for
another replication slot, the function returns an entry that points to
the same shared entry with the first backend. Consequently, the two
backends end up sharing the same shared stats entry, but for different
slots.

The attached ad-hoc patch appears to be working somehow for this
specific scenario. (It can contain any defects including possible
shared entry leaks.) We need to find a better approach to prevent the
reuse of an already-reinited entry. I believe it can be fixed by
adding a reuse count to both the cached entry and shared entry, then
we could compare these numbers to verify the cached entry. However, I
can't think of a solution that wouldn't require additional struct
members for now. Thus I'm not sure how to fix this for older versions
without them..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
reinit_recheck.txt text/plain 1.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ken McClaren 2023-06-01 00:21:01 Re: Order of operations in postgreSQL.
Previous Message Wetmore, Matthew (CTR) 2023-05-31 20:41:50 RE: Order of operations in postgreSQL.