Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Date: 2025-02-14 05:54:59
Message-ID: CAHut+PtrLu=Yrxo_YQ-LC+LSOEUYmuFo2brjCQ18JM9-Vi2DwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ajin,

Some review comments for the patch v13-0002.

======
src/backend/replication/logical/reorderbuffer.c

1. GENERAL

I felt that a summary/overview of how all this filter/cache logic
works should be given in the large file header comment at the top of
this file. There may be some overlap with comments that are already in
the "RelFileLocator filtering" section.

~~~

ReorderBufferRelFileLocatorEnt:

2.
+/* entry for hash table we use to track if the relation can be filtered */
+typedef struct ReorderBufferRelFileLocatorEnt

/* Hash table entry used to determine if the relation can be filtered. */

~~~

ReorderBufferQueueChange:

3.
+ /*
+ * If we're not filtering and we've crossed the change threshold,
+ * attempt to filter again
+ */

SUGGESTION
If filtering was suspended, and we've crossed the change threshold
then reenable filtering.

~~~

ReorderBufferGetRelation:

4.
+static Relation
+ReorderBufferGetRelation(ReorderBuffer *rb, RelFileLocator *rlocator,
+ bool has_tuple)

Would a better name be ReorderBufferGetRelationForDecoding(). Yeah,
it's a bit long but perhaps it explains the context/purpose better.

~~~

5.
+ if (filterable)
+ {
+ RelationClose(relation);
+ return NULL;
+ }

I wonder if some small descriptive comment would be helpful here just
to say we are returning NULL to indicate that this relation is not
needed and yada yada...

~~~

RelFileLocatorCacheInvalidateCallback:

6.
+ /* slightly inefficient algorithm but not performance critical path */
+ while ((entry = (ReorderBufferRelFileLocatorEnt *)
hash_seq_search(&status)) != NULL)
+ {
+ /*
+ * If relid is InvalidOid, signaling a complete reset, we must remove
+ * all entries, otherwise just remove the specific relation's entry.
+ * Always remove negative cache entries.
+ */
+ if (relid == InvalidOid || /* complete reset */
+ entry->relid == InvalidOid || /* negative cache entry */
+ entry->relid == relid) /* individual flushed relation */

6a.
Maybe uppercase that 1st comment.

~

6b.
It seems a bit unusual to be referring to "negative cache entries". I
thought it should be described in terms of InvalidOid since that is
what it is using in the condition.

~

6c.
If the relid parameter can take special values like "If relid is
InvalidOid, signaling a complete reset" that sounds like the kind of
thing that should be documented in the function comment.

~~~

ReorderBufferFilterByRelFileLocator

7.
Despite the extra indenting required, I wondered if the logic may be
easier to read (e.g. it shows the association of the
rb->can_filter_change and entry->filterable more clearly) if this is
refactored slightly by sharing a single common return like below:

BEFORE
...
+ key.reltablespace = rlocator->spcOid;
+ key.relfilenumber = rlocator->relNumber;
+ entry = hash_search(RelFileLocatorFilterCache, &key, HASH_ENTER, &found);
+
+ if (found)
+ {
+ rb->can_filter_change = entry->filterable;
+ return entry->filterable;
+ }
...
+ rb->can_filter_change = entry->filterable;
...
+ return entry->filterable;
+}

AFTER
...
+ key.reltablespace = rlocator->spcOid;
+ key.relfilenumber = rlocator->relNumber;
+ entry = hash_search(RelFileLocatorFilterCache, &key, HASH_ENTER, &found);
+
+ if (!found)
+ {
...
+ }
+
+ rb->can_filter_change = entry->filterable;
+ return entry->filterable;
+}

======
src/include/replication/reorderbuffer.h

8.
+ /* should we try to filter the change? */
+ bool can_filter_change;
+

I think most of my difficulty reading this patch was due to this field
name 'can_filter_change'.

'can_filter_change' somehow sounded to me like it is past tense. e.g.
like as if we already found some change and we yes, we CAN filter it.

But AFAICT the real meaning is simply that (when the flag is true) we
are ALLOWED to check to see if there is anything filterable. In fact,
the change may or may not be filterable.

Can this be renamed to something more "correct"? e.g.
- 'allow_change_filtering'
- 'enable_change_filtering'
- etc.

~~

9.
+ /* number of changes after a failed attempt at filtering */
+ int8 processed_changes;

Maybe 'unfiltered_changes_count' is a better name for this field?

~~~

10.
+extern bool ReorderBufferCanFilterChanges(ReorderBuffer *rb);

Should match the 'can_filter_change' field name, so if you change that
(see comment #8), then change this too.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-14 06:02:51 Re: Move wal_buffers_full to WalUsage (and report it in pgss/explain)
Previous Message Pavel Stehule 2025-02-14 05:53:01 Re: Is pgAdmin the only front-end to PostgreSQL debugger ? And is "a working pl/pgsql debugger" something core should care to maintain ?