Quick Links

Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Erwin Brandstetter <brsaweda(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
Date:	2022-10-17 23:58:47
Message-ID:	CAApHDvq_-XLUke9A_u16ttuEUFHJp6cZP8YZsSZpmOJ=aMC4uA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 18 Oct 2022 at 12:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Erwin Brandstetter <brsaweda(at)gmail(dot)com> writes:
> > I am thinking of building a test case to run
> > - all existing window functions
> > - with all basic variants of frame definitions
> > - once with ROWS, once with RANGE
> > - on basic table that has duplicate and NULL values in partition and
> > ordering columns
> > - in all supported major versions
>
> > To verify for which of our window functions ROWS vs. RANGE never makes a
> > difference.
> > That should be obvious in most cases, just to be sure.
>
> > Do you think this would be helpful?
>
> Doubt it. Per the old saying "testing can prove the presence of bugs,
> but not their absence", this could prove that some functions *do*
> respond to these options, but it cannot prove that a function
> *doesn't*. Maybe you just didn't try the right test case.

I suppose this is kind of like fuzz testing. Going by "git log
--grep=sqlsmith", fuzzing certainly has found bugs for us in the past.
I personally wouldn't discourage Erwin from doing this.

For me, my first port of call will be to study the code of each window
function to see if the frame options can affect the result. I *do*
need to spend more time on this still. It would be good to have some
extra assurance on having read the code with some more exhaustive
testing results. If Erwin was to find result variations that I missed
then we might avoid writing some new bugs.

Also, I just did spend a little more time reading a few window
functions and I see percent_rank() is another candidate for this
optimisation. I've never needed to use that function before, but from
the following experiment, it seems to just be (rank() over (order by
...) - 1) / (count(*) over () - 1). Since rank() is already on the
list and count(*) over() contains all rows in the frame, then it seems
percent_rank() can join the club too.

create table t0 as select x*random() as a from generate_series(1,1000000)x;
select * from (select a,percent_rank() over (order by a) pr,(rank()
over (order by a) - 1) / (count(*) over () - 1)::float8 pr2 from t0)
c where pr <> pr2;

David

In response to

Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options at 2022-10-17 23:18:14 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2022-10-18 00:00:19	Re: Warning about using pg_stat_reset() and pg_stat_reset_shared()
Previous Message	Peter Geoghegan	2022-10-17 23:52:11	Re: New strategies for freezing, advancing relfrozenxid early