From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Why do indexes and sorts use the database collation? |
Date: | 2023-11-17 19:39:40 |
Message-ID: | 13fb2032cbfda29bb4c6c55ccc41a3d01a8de717.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 2023-11-13 at 14:12 -0800, Andres Freund wrote:
> Why on earth are we solving this by having multiple pg_collation
> entries for
> exactly the same collation, instead of normalizing the collation-name
> during
> lookup by adding the relevant encoding name if not explicitly
> specified? It
> makes a lot of sense to not force the user to specify the encoding
> when it
> can't differ.
I'm not aware of it being a common practical problem, so perhaps lack
of motivation. But you're right that it doesn't look very efficient.
We can even go deeper into ICU if we wanted to: lots of locales are
actually aliases to a much smaller number of actual collators. And a
lot are just aliases to the root locale. It's not trivial to reliably
tell if two collators are identical, but in principle it should be
possible: each collation is just a set of tailorings on top of the root
locale, so I suppose if those are equal it's the same collator, right?
> It's imo similarly absurd that an index with "default" collation
> cannot be
> used when specifying the equivalent collation explicitly in the query
> and vice
> versa.
The catalog representation is not ideal to treat the database collation
consistently with other collations. It would be nice to fix that.
> > > >
> Jeff was saying that textual primary keys typically don't need
> sorting and
> because of that we could default to "C", for performance. Part of my
> response
> was that I think the user's intent could be expressed by specifying
> the column
> collation as "C" - to which Jeff replied that that would change the
> semantics. Which, to me, seems to completely run counter to his
> argument that
> we could just use "C" for such indexes.
I am saying we shouldn't prematurely optimize for the case of ORDER BY
on a text PK case by making a an index with a non-"C" collation, given
the costs and risks of non-"C" indexes. Particularly because, even if
there is an ORDER BY, there are several common reasons such an index
would not help anyway.
> > > >
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-11-17 19:42:00 | Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 |
Previous Message | Dmitry Dolgov | 2023-11-17 19:17:32 | Re: Schema variables - new implementation for Postgres 15 |