Re: row filtering for logical replication

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: movead li <movead(dot)li(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2019-11-25 02:38:51
Message-ID: CA+HiwqG3Jz-cRS=4gqXmZDjDAi==19GvrFCCqAawwHcOCEn4fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Euler,

Thanks for working on this. I have reviewed the patches, as I too am
working on a patch related to logical replication [1].

On Thu, Sep 26, 2019 at 8:20 AM Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
>
> Em qua, 25 de set de 2019 às 08:08, Euler Taveira
> <euler(at)timbira(dot)com(dot)br> escreveu:
> >
> > I'll send a patchset later today.
> >
> ... and it is attached.

Needed to be rebased, which I did, to be able to test them; patches attached.

Some comments:

* 0001: seems a no-brainer

* 0002: seems, um, unnecessary? The only place ntuples will be used is here:

@@ -702,9 +702,8 @@ fetch_remote_table_info(char *nspname, char *relname,
(errmsg("could not fetch table info for table \"%s.%s\": %s",
nspname, relname, res->err)));

- /* We don't know the number of rows coming, so allocate enough space. */
- lrel->attnames = palloc0(MaxTupleAttributeNumber * sizeof(char *));
- lrel->atttyps = palloc0(MaxTupleAttributeNumber * sizeof(Oid));
+ lrel->attnames = palloc0(res->ntuples * sizeof(char *));
+ lrel->atttyps = palloc0(res->ntuples * sizeof(Oid));

but you might as well use tuplestore_tuple_count(res->tuplestore). My
point is that if ntuples that this patch is adding was widely useful
(as would be shown by the number of places that could be refactored to
use it), it would have been worthwhile to add it.

* 0003: seems fine to me.

* 0004: seems fine too, although maybe preproc.y should be updated too?

* 0005: naturally many comments here :)

+ <entry>Expression tree (in the form of a
+ <function>nodeToString()</function> representation) for the relation's

Minor nitpicking: "in the form of a" seems unnecessary. Other places
that mention nodeToString() just say "in
<function>nodeToString()</function> representation"

+ Columns used in the <literal>WHERE</literal> clause must be part of the
+ primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
+ <command>UPDATE</command> and <command>DELETE</command> operations will not
+ be replicated.
+ </para>

Can you please explain the reasoning behind this restriction. Sorry
if this is already covered in the up-thread discussion.

/*
+ * Gets list of PublicationRelationQuals for a publication.
+ */
+List *
+GetPublicationRelationQuals(Oid pubid)
+{
...
+ relqual->relation = table_open(pubrel->prrelid,
ShareUpdateExclusiveLock);

I think it's a bad idea to open the table in one file and rely on
something else in the other file closing it. I know you're having it
to do it because you're using PublicationRelationQual to return
individual tables, but why not just store the table's OID in it and
only open and close the relation where it's needed. Keeping the
opening and closing of relation close to each other is better as long
as it doesn't need to be done many times over in many different
functions. In this case, pg_publication.c: publication_add_relation()
is the only place that needs to look at the open relation, so opening
and closing should both be done there. Nothing else needs to look at
the open relation.

Actually, OpenTableList() should also not open the relation. Then we
don't need CloseTableList(). I think it would be better to refactor
things around this and include the patch in this series.

+ /* Find all publications associated with the relation. */
+ pubrelsrel = table_open(PublicationRelRelationId, AccessShareLock);

I guess you meant:

/* Get all relations associated with this publication. */

+ relqual->whereClause = copyObject(qual_expr);

Is copying really necessary?

+ /*
+ * ALTER PUBLICATION ... DROP TABLE cannot contain a WHERE clause. Use
+ * publication_table_list node (that accepts a WHERE clause) but forbid
+ * the WHERE clause in it. The use of relation_expr_list node just for
+ * the DROP TABLE part does not worth the trouble.
+ */

This comment is not very helpful, as it's not clear what the various
names are referring to. I'd just just write:

/*
* Although ALTER PUBLICATION's grammar allows WHERE clause to be
* specified for DROP TABLE action, it doesn't makes sense to allow it.
* We implement that rule here, instead of complicating grammar to enforce
* it.
*/

+ errmsg("cannot use a WHERE clause for
removing table from publication \"%s\"",

I think: s/for/when/g

+ /*
+ * Remove publication / relation mapping iif (i) table is not
+ * found in the new list or (ii) table is found in the new list,
+ * however, its qual does not match the old one (in this case, a
+ * simple tuple update is not enough because of the dependencies).
+ */

Aside from the typo on the 1st line (iif), I suggest writing this as:

/*-----------
* Remove the publication-table mapping if:
*
* 1) Table is not found the new list of tables
*
* 2) Table is being re-added with a different qual expression
*
* For (2), simply updating the existing tuple is not enough,
* because of the qual expression's dependencies.
*/

+ errmsg("functions are not allowed in WHERE"),

Maybe:

functions are now allowed in publication WHERE expressions

+ err = _("cannot use subquery in publication WHERE expression");

s/expression/expressions/g

+ case EXPR_KIND_PUBLICATION_WHERE:
+ return "publication expression";

Maybe:

publication WHERE expression
or
publication qual

- int natt;
+ int n;

Are this and other related changes really needed?

+ appendStringInfoString(&cmd, "COPY (SELECT ");
+ /* list of attribute names */
+ first = true;
+ foreach(lc, attnamelist)
+ {
+ char *col = strVal(lfirst(lc));
+
+ if (first)
+ first = false;
+ else
+ appendStringInfoString(&cmd, ", ");
+ appendStringInfo(&cmd, "%s", quote_identifier(col));
+ }

Hmm, why wouldn't SELECT * suffice?

+ estate = create_estate_for_relation(relation);
+
+ /* prepare context per tuple */
+ ecxt = GetPerTupleExprContext(estate);
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+ ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate,
tupdesc, &TTSOpsHeapTuple);
...
+ ExecDropSingleTupleTableSlot(ecxt->ecxt_scantuple);
+ FreeExecutorState(estate);

Creating and destroying the EState (that too with the ResultRelInfo
that is never used) for every tuple seems wasteful. You could store
the standalone ExprContext in RelationSyncEntry and use it for every
tuple.

+ /* evaluates row filter */
+ expr_type = exprType(qual);
+ expr = (Expr *) coerce_to_target_type(NULL, qual,
expr_type, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST,
-1);
+ expr = expression_planner(expr);
+ expr_state = ExecInitExpr(expr, NULL);

Also, there appears to be no need to repeat this for every tuple? I
think this should be done only once, that is, RelationSyncEntry.qual
should cache ExprState nodes, not bare Expr nodes.

Given the above comments, the following seems unnecessary:

+extern EState *create_estate_for_relation(Relation rel);

By the way, make check doesn't pass. I see the following failure:

- "public.testpub_rf_tbl3" WHERE ((e > 300) AND (e < 500))
+ "public.testpub_rf_tbl3"

but I guess applying subsequent patches takes care of that.

* 0006 and 0007: small enough that I think it might be better to merge
them into 0005.

* 0008: no comments as it's not intended to be committed. :)

Thanks,
Amit

[1] https://commitfest.postgresql.org/25/2301/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-11-25 02:48:29 Re: row filtering for logical replication
Previous Message Michael Paquier 2019-11-25 02:31:58 Re: Remove configure --disable-float4-byval and --disable-float8-byval