Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.
Date: 2024-10-31 13:08:58
Message-ID: CA+HiwqEHx++qTjCGmWyYZDiPR=_fLGPjR7RcFEVnSNZVc-Qj2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 30, 2024 at 9:36 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
> On Wed, Oct 30, 2024 at 11:58 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> >
> > I missed a case when column collation and partition key collation are
> > the same and indeterministic.
> > that should be fine for partition-wise join.
> > so v2 attached.
> >
> > have_partkey_equi_join, match_expr_to_partition_keys didn't do any
> > collation related check.
> > propose v2 change disallow partitionwise join for case when
> > column collation is indeterministic *and* is differ from partition
> > key's collation.
> >
> > the attached partition_wise_join_collation.sql is the test script.
> > you may use it to compare with the master behavior.
>
> What if the partkey collation and column collation are both deterministic,
> but with different sort order?
>
> I'm not familiar with this part of the code base, but it seems to me the
> partition wise join should use partkey collation instead of column collation,
> because it's the partkey collation that decides which partition a row to
> be dispatched.
>
> What Jian proposed is also reasonable but seems another aspect of $subject?

I think we should insist that the join key collation and the partition
collation are exactly the same and refuse to match them if they are
not.

+ {
+ Oid colloid = exprCollation((Node *) expr);
+
+ if ((partcoll != colloid) &&
+ OidIsValid(colloid) &&
+ !get_collation_isdeterministic(colloid))
+ *coll_incompatiable = true;

I am not quite sure what is the point of checking whether or not the
expression collation is deterministic after confirming that it's not
the same as partcoll.

Attached 0002 is what I came up with. One thing that's different from
what Jian proposed is that match_expr_to_partition_keys() returns -1
(expr not matched to any key) when the collation is also not matched
instead of using a separate output parameter for that.

0001 is the patch for the partitionwise grouping issue (bug #18568).
I concluded that even partial aggregate should be disallowed when the
grouping collation doesn't match partitioning collation. The result
with partial partitionwise grouping is not the same as when
enable_partitionwise_aggregate is off.

--
Thanks, Amit Langote

Attachment Content-Type Size
v3-0001-Disallow-partitionwise-grouping-when-collation-do.patch application/octet-stream 11.7 KB
v3-0002-Disallow-partitionwise-join-when-collation-doesn-.patch application/octet-stream 8.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Karthik Ramanathan 2024-10-31 13:20:54 "command cannot affect row a second time" in INSERT ... ON CONFLICT
Previous Message Jesper Pedersen 2024-10-31 12:59:01 Re: protocol-level wait-for-LSN