Final result (display) collation?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Final result (display) collation?
Date: 2024-12-17 23:56:09
Message-ID: 0fcc2555a7e9c16858c5bc72d9587a2f4b8fbcfa.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Say an application developer wants the results in their application to
appear in an order that matches the application user's locale. They
don't want the results themselves to change; they just want the order
to change for display purposes to the end user. Assume the developer
already has ORDER BY clauses in the relevant queries, but no COLLATE
clauses.

That leaves three options:

1. Sort in the application.

2. Change the database default collation.

3. Add COLLATE clauses to the ORDER BY.

#1 is a fine solution in many cases, but not all: if it were universal,
we wouldn't even need top-level ORDER BY. #2 isn't suitable unless you
can settle on a single locale and make it permanent for the entire
database. So I'd like to see if we can improve #3 for the cases where
where #1 and #2 aren't suitable.

Trying to add COLLATE clauses today creates either boilerplate in the
application queries, or creates challenges with query generation. It
can also cause problems porting the application. On top of that, we
don't guarantee that particular locales will work: my machine has "it-
IT-x-icu" but not "it_IT" -- so the application developer needs to be
careful.

The SQL standard specifies "SET COLLATION" to change the default
collation just for the session, which sounds interesting. But it
appears to affect the query semantics, not just the result order, and
that might cause all kinds of problems. For instance, if we allow it to
affect the results of plpgsql functions, then it's another source of
problems similar to search_path. I assume that's why I haven't been
able to find proposals for SET COLLATION?

Crazy idea: what if we treated the top-level ORDER BY as special? That
is, we create a new node ResultOrderBy and make it visible in EXPLAIN.
Then we can have a GUC to control the default collation only for that
node.

* It would only change the order of the results sent to the client,
not the results themselves. (I realize I'm twisting the definition of
"results" a bit here...)

* It wouldn't have surprising downstream consequences for collation
determination because returning results is the last step.

* Setting the result_order_collation to 'default' would be the same
as the current behavior.

* You could control whether pushdowns of this node are enabled (to be
more like adding COLLATE clauses) or disabled (to be more like sorting
in the application).

* You could have separate knobs to control whether it applies to
cursors, prepared statements, etc., which could avoid some kinds of
problems.

Thoughts?

Regards,
Jeff Davis

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Pang 2024-12-18 00:50:16 Back-patch of: avoid multiple hard links to same WAL file after a crash
Previous Message Michail Nikolaev 2024-12-17 23:29:13 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements