Re: general purpose array_sort

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "andreas(at)proxel(dot)se" <andreas(at)proxel(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: general purpose array_sort
Date: 2024-11-07 14:06:04
Message-ID: CAEG8a3+QA+QVnhnATPiSH=QBui+Y5Z-j0NmecSL6yrN8E1dXzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 5, 2024 at 9:13 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Nov 04, 2024 at 03:16:35PM +0800, jian he wrote:
> > drop table if exists t;
> > CREATE TABLE t (a int[]);
> > insert into t values ('{1,3}'),('{1,2,3}'),('{11}');
> > insert into t values ('{{1,12}}'), ('{{4,3}}');
> > SELECT array_sort(a) from t;
> >
> > In the above case,
> > tuplesort_begin_datum needs the int type information and int[] type information.
> > otherwise the cached TypeCacheEntry is being used to sort mult-dimension array,
> > which will make the result false.
>
> All these behaviors need more extensive testing.
>
> This brings me an extra question around the caching. Would the
> sorting be able to behave correctly when feeding to a single
> array_sort() context array values that have multiple COLLATE clauses?
> Or merge_collation_state() would be smart enough to make sure that
> collation conflicts never happen to begin with? I am wondering if we
> should worry about multiple VALUES, CTEs, or PL functions where
> array_sort() could be fed into its cache values that lead to
> unpredictible results for some values. This stuff should perhaps have
> more testing around such behaviors, stressing what kind of
> interactions we have between the sorting of multiple values and the
> caching, in the context of a single array_sort() call.

I'm afraid this can not be achieved in my current implementation, a simple
case is:

SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[]);
{Abc,bar,bbc,CCC,foo,NULL}
SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C");
{Abc,CCC,bar,bbc,foo,NULL}

SELECT array_sort(a COLLATE "C") FROM (VALUES
('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"),
('{foo,bar,null,CCC,Abc,bbc}'::text[])) v(a);
{Abc,CCC,bar,bbc,foo,NULL}
{Abc,CCC,bar,bbc,foo,NULL}

Maybe add some documents to specify this?

> --
> Michael

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-11-07 14:21:09 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message Junwang Zhao 2024-11-07 13:56:39 Re: general purpose array_sort