Re: ANY_VALUE aggregate

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ANY_VALUE aggregate
Date: 2022-12-06 04:06:46
Message-ID: CAMsGm5fb8nRjTSJfAQ4LAyxDyY9NLj2=TnzQC2V9jO8zf710Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 5 Dec 2022 at 22:52, Vik Fearing <vik(at)postgresfriends(dot)org> wrote:

> On 12/5/22 20:31, Corey Huinker wrote:
> >
> > Adding to the pile of wanted aggregates: in the past I've lobbied for
> > only_value() which is like first_value() but it raises an error on
> > encountering a second value.
>
> I have had use for this in the past, but I can't remember why. What is
> your use case for it? I will happily write a patch for it, and also
> submit it to the SQL Committee for inclusion in the standard. I need to
> justify why it's a good idea, though, and we would need to consider what
> to do with nulls now that there is <unique null treatment>.
>

I have this in my local library of "stuff that I really wish came with
Postgres", although I call it same_agg and it just goes to NULL if there
are more than one distinct value.

I sometimes use it when normalizing non-normalized data, but more commonly
I use it when the query planner isn't capable of figuring out that a column
I want to use in the output depends only on the grouping columns. For
example, something like:

SELECT group_id, group_name, count(*) from group_group as gg natural join
group_member as gm group by group_id

I think that exact example actually does or is supposed to work now, since
it realizes that I'm grouping on the primary key of group_group so the
group_name field in the same table can't differ between rows of a group,
but most of the time when I expect that feature to allow me to use a field
it actually doesn't.

I have a vague notion that part of the issue may be the distinction between
gg.group_id, gm.group_id, and group_id; maybe the above doesn't work but it
does work if I group by gg.group_id instead of by group_id. But obviously
there should be no difference because in this query those 3 values cannot
differ (outer joins are another story).

For reference, here is my definition:

CREATE OR REPLACE FUNCTION same_sfunc (
a anyelement,
b anyelement
) RETURNS anyelement
LANGUAGE SQL IMMUTABLE STRICT
SET search_path FROM CURRENT
AS $$
SELECT CASE WHEN $1 = $2 THEN $1 ELSE NULL END
$$;
COMMENT ON FUNCTION same_sfunc (anyelement, anyelement) IS 'SFUNC for
same_agg aggregate; returns common value of parameters, or NULL if they
differ';

DROP AGGREGATE IF EXISTS same_agg (anyelement);
CREATE AGGREGATE same_agg (anyelement) (
SFUNC = same_sfunc,
STYPE = anyelement
);
COMMENT ON AGGREGATE same_agg (anyelement) IS 'Return the common non-NULL
value of all non-NULL aggregated values, or NULL if some values differ';

You can tell I've had this for a while - there are several newer Postgres
features that could be used to clean this up noticeably.

I also have a repeat_agg which returns the last value (not so interesting)
but which is sometimes useful as a window function (more interesting:
replace NULLs with the previous non-NULL value in the column).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-12-06 04:17:04 Re: move some bitmapset.c macros to bitmapset.h
Previous Message Nathan Bossart 2022-12-06 03:54:45 Re: Generate pg_stat_get_* functions with Macros