From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Date: | 2023-02-01 12:12:19 |
Message-ID: | CAGPVpCSNExJ3tgK8QgnNUb1QVGvJNprW7LWJ-8fWfGKgtcittw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Shveta,
shveta malik <shveta(dot)malik(at)gmail(dot)com>, 1 Şub 2023 Çar, 15:01 tarihinde şunu
yazdı:
> On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> 2) I found a crash in the previous patch (v9), but have not tested it
> on the latest yet. Crash happens when all the replication slots are
> consumed and we are trying to create new. I tweaked the settings like
> below so that it can be reproduced easily:
> max_sync_workers_per_subscription=3
> max_replication_slots = 2
> and then ran the test case shared by you. I think there is some memory
> corruption happening. (I did test in debug mode, have not tried in
> release mode). I tried to put some traces to identify the root-cause.
> I observed that worker_1 keeps on moving from 1 table to another table
> correctly, but at some point, it gets corrupted i.e. origin-name
> obtained for it is wrong and it tries to advance that and since that
> origin does not exist, it asserts and then something else crashes.
> From log: (new trace lines added by me are prefixed by shveta, also
> tweaked code to have my comment 1 fixed to have clarity on worker-id).
>
> form below traces, it is clear that worker_1 was moving from one
> relation to another, always getting correct origin 'pg_16688_1', but
> at the end it got 'pg_16688_49' which does not exist. Second part of
> trace shows who updated 'pg_16688_49', it was done by worker_49 which
> even did not get chance to create this origin due to max_rep_slot
> reached.
>
Thanks for investigating this error. I think it's the same error as the one
Shi reported earlier. [1]
I couldn't reproduce it yet but will apply your tweaks and try again.
Looking into this.
Thanks,
--
Melih Mutlu
Microsoft
From | Date | Subject | |
---|---|---|---|
Next Message | adherent postgres | 2023-02-01 12:24:11 | About PostgreSQL Core Team |
Previous Message | Melih Mutlu | 2023-02-01 12:07:25 | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |