Quick Links

Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

From:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ben Clements <benhasgonewalking(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
Date:	2023-03-07 03:11:42
Message-ID:	CAKFQuwZQ3MLbE+FP6mJb-YfH86hGzTj=dF_F_z_8RmjTwp6wBQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Mar 6, 2023 at 7:51 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Tue, 7 Mar 2023 at 12:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Ben Clements <benhasgonewalking(at)gmail(dot)com> writes:
> > > As shown above, the following calculated column can bring in the city
> name,
> > > even though the city name isn't in the GROUP BY:
> > > max(city) keep (dense_rank first order by population desc)
> >
> > You haven't really explained what this does, let alone why it can't
> > be implemented with existing features such as FILTER and ORDER BY.
>
> (It wasn't clear to me until I watched the youtube video.)

> Likely KEEP is more flexible than just the given example but I think
> that something similar to the example given could be done by inventing
> a TOP() and BOTTOM() aggregate. Then you could write something like:
>
> select
> country,
> count(*),
> max(population),
> bottom(city, population)
> from
> cities
> group by
> country
> having
> count(*) > 1
>
> the transfn for bottom() would need to remember the city and the
> population for the highest yet seen value of the 2nd arg.

BOTTOM() remembers the highest value?

> Where this wouldn't work would be if multiple columns were
> required to tiebreak the sort.
>

TOP(city, ROW(population, land_area)) ?

I'd assume since the whole thing can be done with
> a subquery that the entire point of having special syntax for this
> would be because we don't want to pay the price of looking at the
> table twice, i.e. performance must matter, so the ability to have
> parallel aggregates here seems good.
>

SELECT country, city,
rank() over (partition by country order by population desc),
count() OVER (partition by country)
FROM cities
WINDOW_HAVING count > 0 AND rank = 1;

That would be, IMO, the idiomatic query form to perform ranking - not
abusing GROUP BY. To add this encourages abusing GROUP BY.

Though I suppose if there is a sufficient performance gain to be had under
GROUP BY the effort might make sense if further improvements to window
function processing cannot be found.

David J.

In response to

Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause at 2023-03-07 02:51:14 from David Rowley

Responses

Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause at 2023-03-07 03:46:22 from David Rowley
Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause at 2023-03-14 08:01:28 from Alban Hertroys

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Rowley	2023-03-07 03:46:22	Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause
Previous Message	Siddharth Jain	2023-03-07 02:53:16	Re: How does Postgres store a B-Tree on disk while using the OS file system?