From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: improve pg_receivewal code |
Date: | 2021-09-03 03:53:02 |
Message-ID: | CALj2ACWw5qz0R2RxLUXbv1jyVyqghdqpJdxzKOEftAsRFE1Q7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 2, 2021 at 9:05 PM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> > 1) Fetch the server system identifier in the first RunIdentifySystem
> > call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> > unexpected changes that may happen in the server while pg_receivewal
> > is connected to it. This can be helpful in scenarios when
> > pg_receivewal tries to reconnect to the server (see the code around
> > pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> > happnend in the server that changed the its system identifier. Once
> > the pg_receivewal establishes the conenction to server again, then the
> > ReceiveXlogStream has a code chunk to compare the system identifier
> > that we received in the initial connection.
>
> I'm not sure what kind of problem could be happening here: if you were
> somewhat routed to another server ? Or if we "switched" the cluster listening
> on that port ?
Yeah. Also, pg_control file on the server can get corrupted for
whatever reasons it may be. This sys identifier check is useful in
case if the pg_receivewal connects to the server again and again.
These are things that sound over cautious to me, however it's nothing
wrong to use what ReceiveXlogStream provides. pg_basebackup does make
use of this already.
> > 2) Move the RunIdentifySystem to identify timeline id and start LSN
> > from the server only if the pg_receivewal failed to get them from
> > FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> > avoided.
>
> That makes sense, even if we add another IDENTIFY_SYSTEM to check against the
> one set in the first place.
>
> > 3) Place the "replication connetion shouldn't have any database name
> > associated" error check right after RunIdentifySystem so that we can
> > avoid fetching wal segment size with RetrieveWalSegSize if at all we
> > were to fail with that error. This change is similar to what
> > pg_recvlogical.c does.
>
> Makes sense.
>
> > 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> > main loop to get the wal from the server. This avoids an unnecessary
> > query for pg_receivewal with "--create-slot" or "--drop-slot".
> > 5) Have an assertion after the pg_receivewal done a good amount of
> > work to find start timeline and LSN might be helpful:
> > Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
> >
> > Attaching a patch that does take care of above improvements. Thoughts?
>
> Overall I think it is good.
Thanks for your review.
> However, you have some formatting issues, here it mixes tabs and spaces:
>
> + /*
> + * No valid data can be found in the existing output
> directory.
> + * Get start LSN position and current timeline ID from
> the server.
> + */
My bad. I forgot to run "git diff --check" on the v1 patch.
> And here it is not formatted properly:
>
> +static char *server_sysid = NULL;
Done.
Here's the v2 with above modifications.
Regards,
Bharath Rupireddy.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-improve-pg_receivewal-code.patch | application/octet-stream | 5.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-09-03 03:56:01 | Re: PoC/WIP: Extended statistics on expressions |
Previous Message | Amit Langote | 2021-09-03 03:23:05 | Re: a misbehavior of partition row movement (?) |