From: | Ajin Cherian <itsajin(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding |
Date: | 2025-04-17 04:00:05 |
Message-ID: | CAFPTHDaSn_-LuaE6v6s9+9HuvqpsJLbg-jz9FokZWekXydU39A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 3, 2025 at 11:01 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Ajin,
>
> I have reviewed patch v16-0001, here are my comments:
>
> 1. There are some places where comments are more than 80 columns
> window. I think it should be <=80 as per [1].
Fixed.
>
> 2. I think, 'rb->unfiltered_changes_count' should be initialised in
> the function 'ReorderBufferAllocate'. While debugging I found that the
> initial value of rb->unfiltered_changes_count is 127. I think it
> should be set to '0' inside 'ReorderBufferAllocate'. Am I missing
> something here?
> I have also added the same comment in point 1. in [2].
>
Fixed.
On Sat, Apr 5, 2025 at 12:34 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Thu, 20 Mar 2025 at 18:09, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 13, 2025 at 5:49 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > Moving this patch to the next CF as this patch needs more design level
> > > inputs which may not be feasible in this CF but do continue to review
> > > the patch.
> > >
> > > regards,
> > > Ajin Cherian
> > > Fujitsu Australia
> >
> > Rebased the patch as it no longer applied.
> >
>
> Hi Ajin,
>
> I reviewed v16-0002 patch, here are my comments:
>
> 1. Guc CHANGES_THRESHOLD_FOR_FILTER was introduced in 0001 patch,
> should this change be in 0001 patch?
>
Fixed.
> - * temporarily suspended. Filtering resumes after processing every 100 changes.
> + * temporarily suspended. Filtering resumes after processing
> CHANGES_THRESHOLD_FOR_FILTER
> + * changes.
>
> 2. Following comment is repeated inside DecodeInsert, DecodeUpdate,
> DecodeDelete, DecodeMultiInsert, DecodeSpecConfirm
> + /*
> + * When filtering changes, determine if the relation associated with the change
> + * can be skipped. This could be because the relation is unlogged or because
> + * the plugin has opted to exclude this relation from decoding.
> + */
> + if (SkipThisChange(ctx, buf->origptr, XLogRecGetXid(r), &target_locator,
>
> Since this comment is the same, should we remove it from the above
> functions and add this where function 'SkipThisChange' is defined?
>
reworded this.
On Thu, Apr 10, 2025 at 2:26 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Ajin.
>
> Some review comments for patch v16-0001.
>
> ======
> Commit message
>
> 1.
> A hash cache of relation file locators is implemented to cache whether a
> relation is filterable or not. This ensures that we only need to retrieve
>
> ~
>
> /hash cache/hash table/
>
> (AFAICT you called this a hash table elsewhere in all code comments)
Fixed.
>
> ~~~
>
> 2.
> Additionally, filtering is paused for transactions containing WAL records
> (INTERNAL_SNAPSHOT, COMMAND_ID, or INVALIDATION) that modify the historical
> snapshot constructed during logical decoding. This pause is necessary because
> constructing a correct historical snapshot for searching publication
> information requires processing these WAL records.
>
> ~
>
> IIUC, the "pause" here is really referring to the 100 changes
> following an unfilterable txn. So, I don't think what you are
> describing here is a "pause" -- it is just another reason for the tx
> to be marked unfilterable, and the pause logic will take care of
> itself. So, maybe it should be worded more like
>
> SUGGESTION
> Additionally, transactions containing WAL records (INTERNAL_SNAPSHOT,
> COMMAND_ID, or INVALIDATION) that modify the historical
> snapshot constructed during logical decoding are deemed unfilterable.
> This is necessary because constructing a correct historical snapshot
> for searching publication information requires processing these WAL
> records.
>
Fixed.
> ======
> src/backend/replication/logical/reorderbuffer.c
>
> 3. Header comment.
>
> If you change the commit message based on previous suggestions, then
> change the comment here also to match it.
>
> ~~~
>
> 4.
> +/*
> + * After encountering a change that cannot be filtered out, filtering is
> + * temporarily suspended. Filtering resumes after processing every 100 changes.
> + * This strategy helps to minimize the overhead of performing a hash table
> + * search for each record, especially when most changes are not filterable.
> + */
> +#define CHANGES_THRESHOLD_FOR_FILTER 100
>
> /every 100/the next 100/
>
> ~~~
>
> +ReorderBufferFilterByRelFileLocator:
>
> 5.
> + * if we are decoding a transaction without these records. See comment on top
> + * of GetDecodableRelation() to see list of relations that are not
> + * decoded by the reorderbuffer.
>
> /to see list of relations/to see the kinds of relation/
>
Fixed.
On Fri, Apr 11, 2025 at 1:15 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Ajin,
>
> Here is another general review comment for patch 0001.
>
> ~~~
>
> I keep getting confused by the distinction between the 2 member fields
> that are often found working hand-in-hand:
> entry->filterable
> rb->can_filter_change
>
> Unfortunately, because the names are very similar I keep blurring the
> meanings, and then nothing makes sense.
>
> IIUC, the meanings are actually like:
>
> entry->filterable. This means filtering is *possible* for this kind of
> relation; it doesn't mean it will happen though.
>
> rb->can_filter_change. This means the plugin will *try* to filter the
> change; it might do nothing if entry->filterable is false;
> can_filter_change bool is used for the 100 change "temporary
> suspension" logic (e.g. so it if is false we won't even try to filter
> despite entry->filterable is true).
>
> If those meanings are accurate I think some better member names might be:
> entry->filterable
> rb->try_to_filter_change
>
> Also these explanations/distinctions need to be made more clearly in
> the commit message and/of file head comments, as well as where those
> members are defined.
Fixed as recommended.
On Fri, Apr 11, 2025 at 2:21 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Ajin,
>
> Some review comments for patch v16-0002.
>
> ======
> doc/src/sgml/logicaldecoding.sgml
>
> 1.
> + To indicate that decoding can be skipped for the given change
> + <parameter>change_type</parameter>, return <literal>true</literal>;
> + <literal>false</literal> otherwise.
>
> /change <parameter>change_type</parameter>/<parameter>change_type</parameter>/
>
> (don't need to say "change" twice)
Fixed.
>
> ======
> src/backend/replication/logical/decode.c
>
> SkipThisChange:
>
> 2.
> +/* Function to determine whether to skip the change */
> +static bool SkipThisChange(LogicalDecodingContext *ctx, XLogRecPtr
> origptr, TransactionId xid,
> + RelFileLocator *target_locator, ReorderBufferChangeType change_type)
> +{
> + return (FilterChangeIsEnabled(ctx) &&
> + ReorderBufferFilterByRelFileLocator(ctx->reorder, xid, origptr,
> target_locator,
> + change_type));
> +}
>
> 2a.
> By convention the function return should be on its own line.
>
> ~
>
> 2b.
> Probably this should be declared *inline* like other functions similar to this?
>
Fixed.
> ~
>
> 2c.
> Consider renaming this function to FilterChange(...). I think that
> might align better with the other functions like
> FilterChangeIsEnabled, FilterByOrigin etc which all refer to
> "filtering" instead of "skipping".
>
Fixed.
> ~~~
>
> 3.
> + /*
> + * When filtering changes, determine if the relation associated with the change
> + * can be skipped. This could be because the relation is unlogged or because
> + * the plugin has opted to exclude this relation from decoding.
> + */
> + if (SkipThisChange(ctx, buf->origptr, XLogRecGetXid(r), &target_locator,
> + REORDER_BUFFER_CHANGE_INSERT))
>
> All these block comments prior to the calls to SkipThisChange seem
> slightly overkill (I think Shlok also reports this). IMO this comment
> can be much simpler:
> e.g.
> Can the relation associated with this change be skipped?
>
> This is repeated in multiple places: DecodeInsert, DecodeUpdate,
> DecodeDelete, DecodeMultiInsert, DecodeSpecConfirm
Fixed.
>
> ======
> src/backend/replication/logical/logical.c
>
> filter_change_cb_wrapper:
>
> 4.
> + /* check if the filter change callback is supported */
> + if (ctx->callbacks.filter_change_cb == NULL)
> + return false;
> +
>
> Other optional callbacks are more terse and just say like below, with
> no comment.
> if (!ctx->callbacks.truncate_cb)
> return;
> SUGGESTION (do the same here)
> if (!ctx->callbacks.filter_change_cb)
> return false;
>
> ======
> src/backend/replication/logical/reorderbuffer.c
>
> 5.
> /*
> * After encountering a change that cannot be filtered out, filtering is
> - * temporarily suspended. Filtering resumes after processing every 100 changes.
> + * temporarily suspended. Filtering resumes after processing
> CHANGES_THRESHOLD_FOR_FILTER
> + * changes.
> * This strategy helps to minimize the overhead of performing a hash table
> * search for each record, especially when most changes are not filterable.
> */
>
> I agree with Shlok. This change seems to belong in patch 0001.
>
Fixed.
> ~~~
>
> ReorderBufferFilterByRelFileLocator
>
> 6.
> ReorderBufferFilterByRelFileLocator(ReorderBuffer *rb, TransactionId xid,
> XLogRecPtr lsn, RelFileLocator *rlocator,
> ReorderBufferChangeType change_type)
> ~
>
> The function comment for this says "Returns true if the relation can
> be filtered; otherwise, false.". I'm not sure that it is strictly
> correct. IMO the comment should explain more about how the return
> boolean depends also on the *change_type*. e.g. IIUC you could be able
> to filter INSERTS but not filter DELETES even for the same relation.
>
Reworded.
On Fri, Apr 11, 2025 at 4:06 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Ajin,
>
> Some review comments for patch v16-0003.
>
> ======
> Patch
>
> 1.
> mode change 100644 => 100755 src/test/subscription/t/001_rep_changes.pl
>
> What's this mode change for?
>
Fixed.
> ======
> Commit message
>
> 2. missing commit message
>
> ======
> src/backend/replication/logical/reorderbuffer.c
Fixed.
>
> 3.
> -#define CHANGES_THRESHOLD_FOR_FILTER 100
> +#define CHANGES_THRESHOLD_FOR_FILTER 0
>
> Hmm. You cannot leave a thing like this in the patch because it seems
> like just a temporary hack and means the patch cannot be committed in
> this form. It needs some kind of more permanent testing solution,
> allowing the tests of this patch can executed efficiently, and the
> patch pushed, all without impacting the end-user. Maybe consider if
> it's possible to have some injection point so the text can "inject" a
> zero here (??).
>
Yes, this is currently just a hack before we decide whether throttling
buffer value is correct and whether it should be user configurable.
> ======
> src/test/subscription/t/001_rep_changes.pl
>
> 4.
> # Create new tables on publisher and subscriber
> $node_publisher->safe_psql('postgres', "CREATE TABLE pub_table (id int
> primary key, data text)");
> $node_publisher->safe_psql('postgres', "CREATE TABLE unpub_table (id
> int primary key, data text)");
> $node_publisher->safe_psql('postgres', "CREATE TABLE insert_only_table
> (id int primary key, data text)");
> $node_publisher->safe_psql('postgres', "CREATE TABLE delete_only_table
> (id int primary key, data text)");
>
> You could combine all those DDLs.
>
Fixed all these.
Attached patch v17
regards,
Ajin Cherian
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
v17-0002-Introduce-an-output-plugin-callback-to-filter-ch.patch | application/octet-stream | 21.2 KB |
v17-0003-Tests-for-filtering-unpublished-changes.patch | application/octet-stream | 5.0 KB |
v17-0001-Filter-transactions-that-need-not-be-published.patch | application/octet-stream | 26.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-04-17 04:12:00 | Re: Make prep_status() message translatable |
Previous Message | Amit Kapila | 2025-04-17 03:42:22 | Re: Skipping schema changes in publication |