From: | Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: [PoC] Implementation of distinct in Window Aggregates |
Date: | 2023-03-12 07:25:48 |
Message-ID: | 770824eb-a01e-354c-2434-7fed1dd832a4@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/01/23 18:10, Ankit Kumar Pandey wrote:
> On 29/12/22 20:58, Ankit Kumar Pandey wrote:
> >
> > On 24/12/22 18:22, Ankit Pandey wrote:
> >> Hi,
> >>
> >> This is a PoC patch which implements distinct operation in window
> >> aggregates (without order by and for single column aggregation, final
> >> version may vary wrt these limitations). Purpose of this PoC is to
> >> get feedback on the approach used and corresponding implementation,
> >> any nitpicking as deemed reasonable.
> >>
> >> Distinct operation is mirrored from implementation in nodeAgg.
> >> Existing partitioning logic determines if row is in partition and
> >> when distinct is required, all tuples for the aggregate column are
> >> stored in tuplesort. When finalize_windowaggregate gets called,
> >> tuples are sorted and duplicates are removed, followed by calling the
> >> transition function on each tuple.
> >> When distinct is not required, the above process is skipped and the
> >> transition function gets called directly and nothing gets inserted
> >> into tuplesort.
> >> Note: For each partition, in tuplesort_begin and tuplesort_end is
> >> involved to rinse tuplesort, so at any time, max tuples in tuplesort
> >> is equal to tuples in a particular partition.
> >>
> >> I have verified it for interger and interval column aggregates (to
> >> rule out obvious issues related to data types).
> >>
> >> Sample cases:
> >>
> >> create table mytable(id int, name text);
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(5, 'B');
> >> insert into mytable values(3, 'A');
> >> insert into mytable values(1, 'A');
> >>
> >> select avg(distinct id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(distinct id) over () from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >>
> >> select avg(distinct id) from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >>
> >> This is my first-time contribution. Please let me know if anything
> >> can be
> >> improved as I`m eager to learn.
> >>
> >> Regards,
> >> Ankit Kumar Pandey
> >
> > Hi all,
> >
> > I know everyone is busy with holidays (well, Happy Holidays!) but I
> > will be glad if someone can take a quick look at this PoC and share
> > thoughts.
> >
> > This is my first time contribution so I am pretty sure there will be
> > some very obvious feedbacks (which will help me to move forward with
> > this change).
> >
> >
> Updated patch with latest master. Last patch was an year old.
>
Attaching patch with rebase from latest HEAD
Thanks,
Ankit
Attachment | Content-Type | Size |
---|---|---|
distinct_windows.patch | text/x-patch | 15.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ankit Kumar Pandey | 2023-03-12 08:17:50 | Re: [PoC] Implementation of distinct in Window Aggregates |
Previous Message | Thomas Munro | 2023-03-12 07:18:30 | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED |