From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: Perform streaming logical transactions by background workers and parallel apply |
Date: | 2022-12-06 03:51:16 |
Message-ID: | CAA4eK1+S5CX=OzD01bgoEuxQqF45zyYV-dJGh3K2f4j6DyDiOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 6, 2022 at 5:27 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are my review comments for patch v55-0002
>
...
>
> 3. pa_spooled_messages
>
> Previously I suggested this function name should be changed but that
> was rejected (see [1] #6a)
>
> > 6a.
> > IMO a better name for this function would be
> > pa_apply_spooled_messages();
> Not sure about this.
>
> ~
>
> FYI the reason for the previous suggestion is because there is no verb
> in the current function name, so the reader is left thinking
> pa_spooled_messages "what"?
>
> It means the caller has to have extra comments like:
> /* Check if changes have been serialized to a file. */
> pa_spooled_messages();
>
> OTOH, if the function was called something better -- e.g.
> pa_check_for_spooled_messages() or similar -- then it would be
> self-explanatory.
>
I think pa_check_for_spooled_messages() could be misleading because we
do apply the changes in that function, so probably a comment as
suggested by you is a better option.
> ~
>
> 4.
>
> /*
> + * Replay the spooled messages in the parallel apply worker if the leader apply
> + * worker has finished serializing changes to the file.
> + */
> +static void
> +pa_spooled_messages(void)
>
> I'm not 100% sure of the logic, so IMO maybe the comment should say a
> bit more about how this works:
>
> Specifically, let's say there was some timeout and the LA needed to
> write the spool file, then let's say the PA timed out and found itself
> inside this function. Now, let's say the LA is still busy writing the
> file -- so what happens next?
>
> Does this function simply return, then the main PA loop waits again,
> then the times out again, then PA finds itself back inside this
> function again... and that keeps happening over and over until
> eventually the spool file is found FS_READY? Some explanatory comments
> might help.
>
No, PA will simply wait for LA to finish. See the code handling for
FS_BUSY state. We might want to slightly improve part of the current
comment to: "If the leader apply worker is busy serializing the
partial changes then acquire the stream lock now and wait for the
leader worker to finish serializing the changes".
>
> 16. apply_spooled_messages
>
> + stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
>
> Something still seems a bit odd about this to me (previously also
> mentioned in review [1] #29) but I cannot quite put my finger on it...
>
> AFAIK the 'stream_fd' is the global the LA is using to remember the
> single stream spool file; It corresponds to the LogicalRepWorker's
> 'stream_fileset'. So using that same global on the PA side somehow
> seemed strange to me. The fileset at PA comes from a different place
> (MyParallelShared->fileset).
>
I think 'stream_fd' is specific to apply module which can be used by
apply, tablesync, or parallel worker. Unfortunately, now, the code in
worker.c is a mix of worker and apply module. At some point, we should
separate apply logic to a separate file.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Vik Fearing | 2022-12-06 03:52:21 | Re: ANY_VALUE aggregate |
Previous Message | Vik Fearing | 2022-12-06 03:46:37 | Re: ANY_VALUE aggregate |