UNION versus collations

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: UNION versus collations
Date: 2024-11-18 22:56:19
Message-ID: 3605568.1731970579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

prepunion.c's plan_union_children(), which merges
identically-propertied UNION operations into one, has this comment:

* NOTE: currently, we ignore collations while determining if a child has
* the same properties. This is semantically sound only so long as all
* collations have the same notion of equality. It is valid from an
* implementation standpoint because we don't care about the ordering of
* a UNION child's result: UNION ALL results are always unordered, and
* generate_union_paths will force a fresh sort if the top level is a UNION.

This argument seems well past its sell-by date. In the first place,
now that we have nondeterministic collations we can't assume that
"all collations have the same notion of equality". In the second
place, since commit 66c0185a3 it's completely untrue that "we
don't care about the ordering of a UNION child's result", and
also untrue that "generate_union_paths will force a fresh sort".

As far as I can tell from some desultory testing, the implementation
issues don't lead to any observable bugs. Even though the code may
try to produce wrongly-ordered paths for the sub-SELECTs, the pathkey
logic will recognize that it's not really the same ordering, leading
to injection of per-child Sorts. Nonetheless, we're wasting cycles
producing useless ordered paths, and perhaps we might pick a less than
optimal plan in some cases (not entirely sure about that).

The point about nondeterministic collations is undeniable though.
I've not bothered to build a test case, but surely one can be
made wherein a sub-UNION acts differently than expected.

So I think we ought to apply the attached as far back as we have
nondeterministic collations.

regards, tom lane

Attachment Content-Type Size
check-collations-when-merging-UNIONs.patch text/x-diff 1.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-11-18 23:20:52 Fix an error while building test_radixtree.c with TEST_SHARED_RT
Previous Message Tom Lane 2024-11-18 22:14:53 Re: pg_dump --no-comments confusion