Re: Excessive number of replication slots for 12->14 logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Hubert Lubaczewski <depesz(at)depesz(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Excessive number of replication slots for 12->14 logical replication
Date: 2022-07-18 11:20:43
Message-ID: CAA4eK1+RL43Qty=Rb+RJhw0+0sm-f2T=7ST=9u0R+vmnDDVZaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 18, 2022 at 3:13 PM hubert depesz lubaczewski
<depesz(at)depesz(dot)com> wrote:
>
> On Mon, Jul 18, 2022 at 09:07:35AM +0530, Amit Kapila wrote:
>
> First error:
> #v+
> 2022-07-18 09:22:07.046 UTC,,,4145917,,62d5263f.3f42fd,2,,2022-07-18 09:22:07 UTC,28/21641,1219146,ERROR,53400,"could not find free replication state slot for replication origin with OID 51",,"Increase max_replication_slots and try again.",,,,,,,"","logical replication worker",,0
> #v-
>
> Nothing else errored out before, no warning, no fatals.
>
> from the first ERROR I was getting them in the range of 40-70 per minute.
>
> At the same time I was logging data from `select now(), * from pg_replication_slots`, every 2 seconds.
>
...
>
> So, it looks that there are up to 10 focal slots, all active, and then there are sync slots with weirdly high counts for inactive ones.
>
> At most, I had 11 active sync slots.
>
> Looks like some kind of timing issue, which would be inline with what
> Kyotaro Horiguchi wrote initially.
>

I think this is a timing issue similar to what Horiguchi-San has
pointed out but due to replication origins. We drop the replication
origin after the sync worker that has used it is finished. This is
done by the apply worker because we don't allow to drop the origin
till the process owning the origin is alive. I am not sure of
repercussions but maybe we can allow dropping the origin by the
process that owns it.

I think this will also be addressed once we start resuing
workers/slots/origin to copy multiple tables in the initial sync phase
as is being discussed in the thread [1].

[1] - https://www.postgresql.org/message-id/CAGPVpCTq%3DrUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-07-18 12:59:26 BUG #17555: Missing rhel-9 repo
Previous Message Francisco Olarte 2022-07-18 10:42:50 Re: BUG #17554: when i use rule on table which have serial column, the nextval exec twice.