Re: Nondeterministic collations and the value returned by GROUP BY x

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Finnerty <jfinnert(at)amazon(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Nondeterministic collations and the value returned by GROUP BY x
Date: 2021-03-16 14:14:30
Message-ID: 3176005.1615904070@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Finnerty <jfinnert(at)amazon(dot)com> writes:
> right. It doesn't matter which of the values is returned; however, a
> plausible-sounding implementation would case-fold the value, like GROUP BY
> LOWER(x), but the case-folded value isn't necessarily one of the original
> values and so that could be subtly wrong in the case-insensitive case, and
> could in principle be completely wrong in the most general nondeterministic
> collation case where the case-folded value isn't even equal to the other
> members of the set.

> does the implementation in PG12 ensure that some member of the set of equal
> values is chosen as the representative value?

Without having actually looked, I'm pretty certain it does.
Considerations of data type independence would seem to rule out a hack
like applying case folding. There might be case folding happening
internally to comparison functions, like citext_cmp, but that wouldn't
affect the grouping logic that is going to save aside one of the
group of peer values.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-16 14:17:46 Re: crash during cascaded foreign key update
Previous Message Amit Langote 2021-03-16 14:02:38 crash during cascaded foreign key update