Re: speed up a logical replica setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-02-20 09:44:14
Message-ID: CALDaNm2ev8_r4N5Sfa2GjzuzZNz17L0nsX_ZqfcarmS5oa5Szw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> > Since it may be useful, I will post top-up patch on Monday, if there are no
> > updating.
>
> And here are top-up patches. Feel free to check and include.
>
> v22-0001: Same as v21-0001.
> === rebased patches ===
> v22-0002: Update docs per recent changes. Same as v20-0002.
> v22-0003: Add check versions of the target. Extracted from v20-0003.
> v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was
> slightly changed.
> === Newbie ===
> V22-0005: Addressed my comments which seems to be trivial[1].
> Comments #1, 3, 4, 8, 10, 14, 17 were addressed here.
> v22-0006: Consider the scenario when commands are failed after the recovery.
> drop_subscription() is removed and some messages are added per [2].
> V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were addressed here.
> V22-0008: Fix a strange report when physical_primary_slot is null. Per comment #9 [1].
> V22-0009: Prohibit reuse publications when it has already existed. Per comments #11 and 12 [1].
> V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits soon. Per comment #15 [1].
> V22-0011: Update testcode. Per comments #17- [1].
>
> I did not handle below points because I have unclear points.
>
> a.
> This patch set cannot detect the disconnection between the target (standby) and
> source (primary) during the catch up. Because the connection status must be gotten
> at the same time (=in the same query) with the recovery status, but now it is now an
> independed function (server_is_in_recovery()).
>
> b.
> This patch set cannot detect the inconsistency reported by Shubham [3]. I could not
> come up with solutions without removing -P...
>

Few comments for v22-0001 patch:
1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
+ if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+ {
+ pg_log_error("permission denied for database %s",
dbinfo[0].dbname);
+ return false;
+ }
+ if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
+ {
+ pg_log_error("permission denied for function \"%s\"",
+
"pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
+ return false;
+ }

2) pg_createsubscriber fails if a table is parallely created in the
primary node:
2024-02-20 14:38:49.005 IST [277261] LOG: database system is ready to
accept connections
2024-02-20 14:38:54.346 IST [277270] ERROR: relation "public.tbl5"
does not exist
2024-02-20 14:38:54.346 IST [277270] STATEMENT: CREATE SUBSCRIPTION
pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
enabled = false)

If we are not planning to fix this, at least it should be documented

3) Error conditions is verbose mode has invalid error message like
"out of memory" messages like in below:
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster reached the consistent state
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_278343" on database "postgres"
pg_createsubscriber: error: could not create subscription
"pg_createsubscriber_5_278343" on database "postgres": out of memory

4) In error cases we try to drop this publication again resulting in error:
+ /*
+ * Since the publication was created before the
consistent LSN, it is
+ * available on the subscriber when the physical
replica is promoted.
+ * Remove publications from the subscriber because it
has no use.
+ */
+ drop_publication(conn, &dbinfo[i]);

Which throws these errors(because of drop publication multiple times):
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: error: could not drop publication
"pg_createsubscriber_5" on database "postgres": ERROR: publication
"pg_createsubscriber_5" does not exist
pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
database "postgres"
pg_createsubscriber: dropping the replication slot
"pg_createsubscriber_5_278343" on database "postgres"

5) In error cases, wait_for_end_recovery waits even though it has
identified that the replication between primary and standby is
stopped:
+/*
+ * Is recovery still in progress?
+ * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
+ * while executing the query, it returns -1.
+ */
+static int
+server_is_in_recovery(PGconn *conn)
+{
+ PGresult *res;
+ int ret;
+
+ res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
+
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ PQclear(res);
+ pg_log_error("could not obtain recovery progress");
+ return -1;
+ }
+

You can simulate this by stopping the primary just before
wait_for_end_recovery and you will see these error messages, but
pg_createsubscriber will continue to wait:
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
pg_createsubscriber: error: could not obtain recovery progress
...

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-20 09:46:08 Re: Add bump memory context type and use it for tuplesorts
Previous Message David Rowley 2024-02-20 09:41:03 Re: Add bump memory context type and use it for tuplesorts