Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ben Clements <benhasgonewalking(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
Date: 2023-03-07 03:11:42
Message-ID: CAKFQuwZQ3MLbE+FP6mJb-YfH86hGzTj=dF_F_z_8RmjTwp6wBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Mar 6, 2023 at 7:51 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Tue, 7 Mar 2023 at 12:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Ben Clements <benhasgonewalking(at)gmail(dot)com> writes:
> > > As shown above, the following calculated column can bring in the city
> name,
> > > even though the city name isn't in the GROUP BY:
> > > max(city) keep (dense_rank first order by population desc)
> >
> > You haven't really explained what this does, let alone why it can't
> > be implemented with existing features such as FILTER and ORDER BY.
>
> (It wasn't clear to me until I watched the youtube video.)

> Likely KEEP is more flexible than just the given example but I think
> that something similar to the example given could be done by inventing
> a TOP() and BOTTOM() aggregate. Then you could write something like:
>
> select
> country,
> count(*),
> max(population),
> bottom(city, population)
> from
> cities
> group by
> country
> having
> count(*) > 1
>
> the transfn for bottom() would need to remember the city and the
> population for the highest yet seen value of the 2nd arg.

BOTTOM() remembers the highest value?

> Where this wouldn't work would be if multiple columns were
> required to tiebreak the sort.
>

TOP(city, ROW(population, land_area)) ?

I'd assume since the whole thing can be done with
> a subquery that the entire point of having special syntax for this
> would be because we don't want to pay the price of looking at the
> table twice, i.e. performance must matter, so the ability to have
> parallel aggregates here seems good.
>

SELECT country, city,
rank() over (partition by country order by population desc),
count() OVER (partition by country)
FROM cities
WINDOW_HAVING count > 0 AND rank = 1;

That would be, IMO, the idiomatic query form to perform ranking - not
abusing GROUP BY. To add this encourages abusing GROUP BY.

Though I suppose if there is a sufficient performance gain to be had under
GROUP BY the effort might make sense if further improvements to window
function processing cannot be found.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2023-03-07 03:46:22 Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
Previous Message Siddharth Jain 2023-03-07 02:53:16 Re: How does Postgres store a B-Tree on disk while using the OS file system?