RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'vignesh C' <vignesh21(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-22 15:46:39
Message-ID: TYCPR01MB12077EFFBA0302913CCE21C96F5562@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Vignesh,

> Few comments:
> 1) The below code can lead to assertion failure if the publisher is
> stopped while dropping the replication slot:
> + if (primary_slot_name != NULL)
> + {
> + conn = connect_database(dbinfo[0].pubconninfo);
> + if (conn != NULL)
> + {
> + drop_replication_slot(conn, &dbinfo[0],
> primary_slot_name);
> + }
> + else
> + {
> + pg_log_warning("could not drop replication
> slot \"%s\" on primary",
> + primary_slot_name);
> + pg_log_warning_hint("Drop this replication
> slot soon to avoid retention of WAL files.");
> + }
> + disconnect_database(conn);
> + }
>
> pg_createsubscriber: error: connection to database failed: connection
> to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
> directory
> Is the server running locally and accepting connections on that socket?
> pg_createsubscriber: warning: could not drop replication slot
> "standby_1" on primary
> pg_createsubscriber: hint: Drop this replication slot soon to avoid
> retention of WAL files.
> pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
> Assertion `conn != ((void *)0)' failed.
> Aborted (core dumped)
>
> This is happening because we are calling disconnect_database in case
> of connection failure case too which has the following assert:
> +static void
> +disconnect_database(PGconn *conn)
> +{
> + Assert(conn != NULL);
> +
> + PQfinish(conn);
> +}

Right. disconnect_database() was moved to if (conn != NULL) block.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> + /* Check standby server version */
> + if ((ver_fd = fopen(versionfile, "r")) == NULL)
> + pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> + /* Version number has to be the first line read */
> + if (!fgets(rawline, sizeof(rawline), ver_fd))
> + {
> + if (!ferror(ver_fd))
> + pg_fatal("unexpected empty file \"%s\"", versionfile);
> + else
> + pg_fatal("could not read file \"%s\": %m", versionfile);
> + }
> +
> + /* Strip trailing newline and carriage return */
> + (void) pg_strip_crlf(rawline);
> +
> + if (strcmp(rawline, PG_MAJORVERSION) != 0)
> + {
> + pg_log_error("standby server is of wrong version");
> + pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> + versionfile,
> rawline, PG_MAJORVERSION);
> + exit(1);
> + }
> +
> + fclose(ver_fd);

> 3) Should this be added to typedefs.list:
> +enum WaitPMResult
> +{
> + POSTMASTER_READY,
> + POSTMASTER_STILL_STARTING
> +};

But the comment from Peter E. [1] was opposite. I did not handle this.

> 4) pgCreateSubscriber should be mentioned after pg_controldata to keep
> the ordering consistency:
> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
> index aa94f6adf6..c5edd244ef 100644
> --- a/doc/src/sgml/reference.sgml
> +++ b/doc/src/sgml/reference.sgml
> @@ -285,6 +285,7 @@
> &pgCtl;
> &pgResetwal;
> &pgRewind;
> + &pgCreateSubscriber;
> &pgtestfsync;

This has been already pointed out by Peter E. I did not handle this.

> 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
> + if (primary_slot_name)
> + {
> + appendPQExpBuffer(str,
> + "SELECT 1 FROM
> pg_replication_slots "
> + "WHERE active AND
> slot_name = '%s'",
> + primary_slot_name);

Fixed.

> 6) Here pg_settings should be pg_catalog.pg_settings:
> + * - max_worker_processes >= 1 + number of dbs to be converted
> +
> *------------------------------------------------------------------------
> + */
> + res = PQexec(conn,
> + "SELECT setting FROM pg_settings
> WHERE name IN ("
> + "'max_logical_replication_workers', "
> + "'max_replication_slots', "
> + "'max_worker_processes', "
> + "'primary_slot_name') "
> + "ORDER BY name");

Fixed.

New version can be available in [2]

[1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]: https://www.postgresql.org/message-id/TYCPR01MB12077CD333376B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-02-22 15:47:01 RE: speed up a logical replica setup
Previous Message Hayato Kuroda (Fujitsu) 2024-02-22 15:46:06 RE: speed up a logical replica setup