DISTINCT *and* ORDER BY in aggregate functions on expressions(!)

From: mirabilos <tg(at)evolvis(dot)org>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)
Date: 2023-02-27 23:11:49
Message-ID: 23d08470-aaa8-afca-dfd7-37261487a1f8@evolvis.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I’ve got a… rather large query (see below), in which I join a complex
data structure (whose exact contents do not, at this point, matter)
together to get some auxiliary data to expose as JSON field.

In this query I can use, for example…

jsonb_build_object('user_permissions',
jsonb_agg(DISTINCT ap.name ORDER BY ap.name))

… to get a distinct, sorted, list of “user permissions” from a table
ap which I joined to the user table which is the main subject of the
query. (For some reason I need to add DISTINCT because else duplica‐
tes are shown.)

Wrapping this as…

jsonb_build_object('user_permissions',
COALESCE(
jsonb_agg(DISTINCT ap.name ORDER BY ap.name)
FILTER (WHERE ap.id IS NOT NULL)))

… gets me the JSON object’s value for the user_permissions set to
null if there’s nothing in the m:n intermediate table for the user
in question.

This works well. However, what I seem to be not allowed to do is
(without the extra COALESCE, to simplify):

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY cot.weekday, cot.from_hour, cot.to_hour)
)

This is because, when I use DISTINCT (but only then‽), the ORDER BY
arguments must be… arguments to the function, or something. In the
above case, the sort key is ap.name which is also the argument to
the jsonb_agg function, so no problem there, but here, the jsonb_agg
argument is the return value of a function so… it has no name.

What I’m looking for is something like…

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour) AS jbo
ORDER BY jbo->>'weekday', jbo->>'from_hour', jbo->>'to_hour')
)

… except I cannot define aliases in that place. Any other syntax
would also work.

The suggested solution for this is apparently to do…

CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

… and then query as…

jsonb_build_object('opening_times',
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)))
)

… which involves internally subquerying for each output row (i.e.
row of the user table) times amount of sub-JSONArrays that need
to be sorted like this, which is currently 3.

All other solutions I can find involve subqueries in the first
place; I am somewhat proud I even managed to write this with
JOINs and without any subqueries in the first place so I’m
hesitant to go that route.

Any advice here?

And, somewhat related: when outputting JSONB, the order of
JSONObject elements is indeterminate, which is… okay, but
forcing sorted (ASCIIbetically, i.e. by codepoint order)
keys would be very very welcome, for reproducibility of
the output. (I’m sure the reproducible-builds project would
also love if this could be changed, or at least added, in a
way it can be enabled for queries, as session parameter
perhaps?)

The query in its current incarnanation is as follows:

-- -----BEGIN SQL-----
CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION core_generalworkavailability_jsonb_sort(JSONB)
RETURNS JSONB AS $$
SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'forenoon', e->>'afternoon', e->>'evening')
FROM jsonb_array_elements($1) AS e
$$ LANGUAGE SQL IMMUTABLE;

COPY (SELECT cp.email, cp.first_name, cp.last_name, cp.street, cp.number,
jsonb_build_object('groups', COALESCE(jsonb_agg(DISTINCT ag.name ORDER BY ag.name)
FILTER (WHERE ag.id IS NOT NULL))) ||
jsonb_build_object('help_operations', COALESCE(jsonb_agg(DISTINCT cho.name ORDER BY cho.name)
FILTER (WHERE cho.id IS NOT NULL))) ||
jsonb_build_object('emergency_opening_times', COALESCE(
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', ceot.weekday,
'from_hour', ceot.from_hour,
'to_hour', ceot.to_hour))
FILTER (WHERE ceot.id IS NOT NULL)))) ||
jsonb_build_object('opening_times', COALESCE(
core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour))
FILTER (WHERE cot.id IS NOT NULL)))) ||
jsonb_build_object('possible_work_times', COALESCE(
core_generalworkavailability_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cgwa.weekday,
'forenoon', cgwa.forenoon,
'afternoon', cgwa.afternoon,
'evening', cgwa.evening))
FILTER (WHERE cgwa.id IS NOT NULL)))) ||
jsonb_build_object('qualifications',
jsonb_build_object('administrative', COALESCE(jsonb_agg(DISTINCT cqa.name ORDER BY cqa.name)
FILTER (WHERE cqa.id IS NOT NULL))) ||
jsonb_build_object('health', COALESCE(jsonb_agg(DISTINCT cqh.name ORDER BY cqh.name)
FILTER (WHERE cqh.id IS NOT NULL))) ||
jsonb_build_object('language', COALESCE(jsonb_agg(DISTINCT cqlang.name ORDER BY cqlang.name)
FILTER (WHERE cqlang.id IS NOT NULL))) ||
jsonb_build_object('license', COALESCE(jsonb_agg(DISTINCT cqlic.name ORDER BY cqlic.name)
FILTER (WHERE cqlic.id IS NOT NULL))) ||
jsonb_build_object('technical', COALESCE(jsonb_agg(DISTINCT cqt.name ORDER BY cqt.name)
FILTER (WHERE cqt.id IS NOT NULL)))) ||
jsonb_build_object('restrictions', COALESCE(jsonb_agg(DISTINCT cr.name ORDER BY cr.name)
FILTER (WHERE cr.id IS NOT NULL))) ||
jsonb_build_object('user_permissions', COALESCE(jsonb_agg(DISTINCT ap.name ORDER BY ap.name)
FILTER (WHERE ap.id IS NOT NULL))) AS "other_data"
FROM core_person cp
LEFT JOIN core_person_emergency_opening_times cpeot ON cpeot.person_id=cp.id
LEFT JOIN core_openingtime ceot ON ceot.id=cpeot.openingtime_id
LEFT JOIN core_person_groups cpg ON cpg.person_id=cp.id
LEFT JOIN auth_group ag ON ag.id=cpg.group_id
LEFT JOIN core_person_help_operations cpho ON cpho.person_id=cp.id
LEFT JOIN core_helpoperation cho ON cho.id=cpho.helpoperation_id
LEFT JOIN core_person_opening_times cpot ON cpot.person_id=cp.id
LEFT JOIN core_openingtime cot ON cot.id=cpot.openingtime_id
LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id
LEFT JOIN core_generalworkavailability cgwa ON cgwa.id=cppwt.generalworkavailability_id
LEFT JOIN core_person_qualifications_administrative cpqa ON cpqa.person_id=cp.id
LEFT JOIN core_qualificationadministrative cqa ON cqa.id=cpqa.qualificationadministrative_id
LEFT JOIN core_person_qualifications_health cpqh ON cpqh.person_id=cp.id
LEFT JOIN core_qualificationhealth cqh ON cqh.id=cpqh.qualificationhealth_id
LEFT JOIN core_person_qualifications_language cpqlang ON cpqlang.person_id=cp.id
LEFT JOIN core_qualificationlanguage cqlang ON cqlang.id=cpqlang.qualificationlanguage_id
LEFT JOIN core_person_qualifications_license cpqlic ON cpqlic.person_id=cp.id
LEFT JOIN core_qualificationlicense cqlic ON cqlic.id=cpqlic.qualificationlicense_id
LEFT JOIN core_person_qualifications_technical cpqt ON cpqt.person_id=cp.id
LEFT JOIN core_qualificationtechnical cqt ON cqt.id=cpqt.qualificationtechnical_id
LEFT JOIN core_person_restrictions cpr ON cpr.person_id=cp.id
LEFT JOIN core_restriction cr ON cr.id=cpr.restriction_id
LEFT JOIN core_person_user_permissions cpup ON cpup.person_id=cp.id
LEFT JOIN auth_permission ap ON ap.id=cpup.permission_id
GROUP BY cp.email, cp.first_name, cp.last_name, cp.street, cp.number
ORDER BY cp.email
) TO STDOUT WITH (FORMAT csv, HEADER, FORCE_QUOTE *, ENCODING 'UTF-8');

DROP FUNCTION core_openingtime_jsonb_sort(JSONB);
DROP FUNCTION core_generalworkavailability_jsonb_sort(JSONB);
-- -----END SQL-----

Thanks in advance,
//mirabilos
--
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-02-28 00:11:05 Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)
Previous Message Kirk Wolak 2023-02-27 22:40:34 Re: pg_get_functiondef(), trailing spaces and + sign