string_agg distinct order by

From: "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: string_agg distinct order by
Date: 2021-08-19 16:09:26
Message-ID: CALZg0g5XLwoqFXsiC05SBiaCZhcB1QXDairtz+Hexf-NSCXPTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am looking for something like

string_agg(distinct col_x order by col_y)

Unfortunately, you can either have the distinct, but then the order by
needs to be identical to what's aggregated, or you can have the order be
determined by another column. Not both...

Here is the playground

Given:

create table sites (
state text,
city text,
col_a text,
col_b text
);

insert into sites values ('Texas','Dallas','green','green');
insert into sites values ('Texas','Houston','green','green');
insert into sites values ('Texas','Austin','yellow','green');
insert into sites values ('Texas','Waco','yellow','yellow');
insert into sites values ('Texas','Midland','red','red');
insert into sites values ('Texas','Amarillo','red','yellow');

For each city, there is a status denoted by colour combination, e.g. 'green
/ green'. This is stored in two different columns in the table.

There is an order in the colours:

create table colours (
colour text,
value integer
);

insert into colours values ('red', 1);
insert into colours values ('yellow', 2);
insert into colours values ('green', 3);

So, red first, yellow second, green last.

I want an aggregated view showing for each state the list of existing
status combinations, such as:

with site_status as (
select
state,
city,
col_a || '/' || col_b as status,
ca.value as val_a,
cb.value as val_b
from
sites a
join colours ca on ca.colour=a.col_a
join colours cb on cb.colour=a.col_b
)
select
state,
string_agg(distinct status,',') as list
from
site_status
group by
state
;

This results in:

Texas green/green,red/red,red/yellow,yellow/green,yellow/yellow

By using distinct in the string_agg, I avoid double entries. Fine. But now,
I want the data ordered. E.g. in the order of the first colour. I SQL, this
could read...

string_agg(distinct status,',' order by val_a) as list

but this doesn't work. I get:

SQL Error [42P10]: ERROR: in an aggregate with DISTINCT, ORDER BY
expressions must appear in argument list

So, I could say:

string_agg(distinct status,',' order by status) as list

but this is not what I want: 'green' would get first, red second, yellow
last...

I could also drop the distinct and say:

string_agg(status,',' order by val_a) as list

This would return the list in correct order, but with double values
('green/green') showing up twice.

I tried to delete the double entries via regexp_replace, but found this
doesn't work for rather long strings (where a single regexp_replace can run
many minutes!)

Any pointers?

Thank you very much for any idea.

======================================================================

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2021-08-19 16:20:00 Re: PostgreSQL Automatic Failover Windows Server
Previous Message Rich Shepard 2021-08-19 16:06:17 Re: Selecting table row with latest date