Quick Links

Re: Why do indexes and sorts use the database collation?

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Why do indexes and sorts use the database collation?
Date:	2023-11-13 18:02:47
Message-ID:	20231113180247.vud4c5uy77ojf6sx@awork3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2023-11-11 23:19:55 -0800, Jeff Davis wrote:
> On Fri, 2023-11-10 at 17:19 -0800, Andres Freund wrote:
> > I guess you are arguing that the user didn't intend to create an
> > index here?
>
> No, obviously the user should expect an index when a primary key is
> created. But that doesn't mean that it necessarily needs to be ordered
> according to the database collation.
>
> Unfortunately, right now the planner doesn't understand that an index
> in the "C" locale can satisfy equality searches and constraint
> enforcement for "en_US" (or any other deterministic collation). That's
> probably the first thing to fix.
>
> Inequalities and ORDER BYs can't benefit from an index with a different
> collation, but lots of indexes don't need that.

But we don't know whether the index is used for that. If we just change the
behaviour, there will be lots of pain around upgrades, because queries will
continue to work but be dog slow.

> > Also, wouldn't the intent to use a different collation for the column
> > be
> > expressed by changing the column's collation?
>
> The column collation expresses the semantics of that column. If the
> user has a database collation of "en_US", they should expect ORDER BY
> on that column to be according to that locale unless otherwise
> specified.

That makes no sense to me. Either the user cares about ordering, in which case
the index needs to be in that ordering for efficient ORDER BY, or they don't,
in which neither index nor column needs a non-C collation. You partially
premised your argument on the content of primary keys typically making non-C
collations undesirable!

> > OTOH, if we are choosing a groupagg, we might be able to implement
> > that using
> > an index, which is more likey to exist in the databases collation.
> > Looks like
> > we even just look for indexes that are in the database collation.
> >
> > Might be worth teaching the planner additional smarts here.
>
> Yeah, we don't need to force anything, we could just create a few paths
> with appropriate path key information and cost them.

I'm not sure it's quite that easy. One issue is obviously that this could lead
to a huge increase in paths we need to keep around due to differing path
keys. We might need to be a bit more aggressive about pruning such paths than
I think we would be today.

> > - Teach the planner to use cheaper collations when ordering for
> > reasons other
> > than the user's direct request (e.g. DISTINCT/GROUP BY, merge
> > joins).
>
> +1. Where "cheaper" comes from is an interesting question -- is it a
> property of the provider or the specific collation? Or do we just call
> "C" special?

I'd think the specific collation. Even if we initially perhaps just get the
default cost from the provider such, it structurally seems the sanest place to
locate the cost.

Greetings,

Andres Freund

In response to

Re: Why do indexes and sorts use the database collation? at 2023-11-12 07:19:55 from Jeff Davis

Responses

Re: Why do indexes and sorts use the database collation? at 2023-11-13 21:36:24 from Tomas Vondra
Re: Why do indexes and sorts use the database collation? at 2023-11-13 23:55:59 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2023-11-13 18:17:32	Re: COPY TO (FREEZE)?
Previous Message	Bruce Momjian	2023-11-13 18:01:32	Re: Question about non-blocking mode in libpq