| From: | Kostiantyn Tomakh <tomahkvt(at)gmail(dot)com> | 
|---|---|
| To: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> | 
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: BUG #18433: Logical replication timeout | 
| Date: | 2024-05-10 13:47:36 | 
| Message-ID: | CAJP09w7ShycVDaEuDOP5FFm8k=aJtj+NdjY5Cb5+TgMNNa46kQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
Hello, Shlok Kyal.
I found the solution for myself. I decided to migrate from PostgreSQL 13 to
Postgresql 15.
I used the following approach Source DB PostgreSQL 13 and destination
Postgresql 15.
Fortunately, this problem exists if Destination DB is PostgreSQL 13.
There are two solutions to this issue:
1) Fix this problem.
2) Inform people that they can have problems if they use PostgreSQL 13 as
Destination DB during Logical replication.
I think the best choose is the first option.
Shlok Kyal, Thank you very much for your help. We really appreciate it
пт, 10 мая 2024 г. в 09:05, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>:
> Hi,
>
> > I was able to reproduce the problem.
> > I did it on docker based platform I hope you will be able to reproduce
> this problem too.
>
> Thanks for providing the detailed steps to reproduce the issue. I was
> able to reproduce the issue with the steps you provided.
> I noticed that the issue regarding the increased table size on the
> subscriber can happen in all versions till Postgres 13 and I was able
> to reproduce that. This is a timing issue and hence you may not be
> getting this issue in postgres 10.
>
> This issue occurs because tablesync worker exits (due to UPDATE
> command) and restarts again as seen in logs:
> 2024-05-01 16:26:15.384 GMT [40] LOG:  logical replication table
> synchronization worker for subscription "db_name_public_subscription",
> table "table" has started
> 2024-05-01 16:26:16.994 GMT [40] ERROR:  logical replication target
> relation "public.table" has neither REPLICA IDENTITY index nor PRIMARY
> KEY and published relation does not have REPLICA IDENTITY FULL
> 2024-05-01 16:26:20.393 GMT [41] LOG:  logical replication table
> synchronization worker for subscription "db_name_public_subscription",
> table "table" has started
>
> Tablesync worker sync the initial data from publisher to subscriber
> using COPY command. But in this case it exits (after copy phase is
> completed) and restarts, so it will perform entire copy operation
> again. And hence we can see the increased table size on the
> subscriber.
>
> This issue is not reproducible in Postgres 14 and above versions. This
> issue was mitigated after the commit [1]. In this commit a new state
> 'FINISHEDCOPY' is introduced. So if the tablesync worker exits (after
> copy phase is completed) and restarts, it donot not perform COPY
> command again and proceeds directly to synchronize the WAL position
> between tablesync worker and apply worker.
>
> code:
> +   else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
> +   {
> +       /*
> +        * The COPY phase was previously done, but tablesync then crashed
> +        * before it was able to finish normally.
> +        */
> +       StartTransactionCommand();
> +
> +       /*
> +        * The origin tracking name must already exist. It was created
> first
> +        * time this tablesync was launched.
> +        */
> +       originid = replorigin_by_name(originname, false);
> +       replorigin_session_setup(originid);
> +       replorigin_session_origin = originid;
> +       *origin_startpos = replorigin_session_get_progress(false);
> +
> +       CommitTransactionCommand();
> +
> +       goto copy_table_done;
> +   }
>
> Backpatching commit [1] to Postgres 13 and Postgres 12 will mitigate this
> issue.
> Thoughts?
>
> [1]
> https://github.com/postgres/postgres/commit/ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e
>
> Thanks and Regards,
> Shlok Kyal
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ugur Yilmaz | 2024-05-10 14:11:25 | Postgresql 16.3 installation error (setup file) on Windows 11 | 
| Previous Message | Shlok Kyal | 2024-05-10 06:05:09 | Re: BUG #18433: Logical replication timeout |