| From: | Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Subject: | Re: [PoC] Implementation of distinct in Window Aggregates |
| Date: | 2023-03-12 07:25:48 |
| Message-ID: | 770824eb-a01e-354c-2434-7fed1dd832a4@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 04/01/23 18:10, Ankit Kumar Pandey wrote:
> On 29/12/22 20:58, Ankit Kumar Pandey wrote:
> >
> > On 24/12/22 18:22, Ankit Pandey wrote:
> >> Hi,
> >>
> >> This is a PoC patch which implements distinct operation in window
> >> aggregates (without order by and for single column aggregation, final
> >> version may vary wrt these limitations). Purpose of this PoC is to
> >> get feedback on the approach used and corresponding implementation,
> >> any nitpicking as deemed reasonable.
> >>
> >> Distinct operation is mirrored from implementation in nodeAgg.
> >> Existing partitioning logic determines if row is in partition and
> >> when distinct is required, all tuples for the aggregate column are
> >> stored in tuplesort. When finalize_windowaggregate gets called,
> >> tuples are sorted and duplicates are removed, followed by calling the
> >> transition function on each tuple.
> >> When distinct is not required, the above process is skipped and the
> >> transition function gets called directly and nothing gets inserted
> >> into tuplesort.
> >> Note: For each partition, in tuplesort_begin and tuplesort_end is
> >> involved to rinse tuplesort, so at any time, max tuples in tuplesort
> >> is equal to tuples in a particular partition.
> >>
> >> I have verified it for interger and interval column aggregates (to
> >> rule out obvious issues related to data types).
> >>
> >> Sample cases:
> >>
> >> create table mytable(id int, name text);
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(5, 'B');
> >> insert into mytable values(3, 'A');
> >> insert into mytable values(1, 'A');
> >>
> >> select avg(distinct id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(distinct id) over () from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >>
> >> select avg(distinct id) from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >>
> >> This is my first-time contribution. Please let me know if anything
> >> can be
> >> improved as I`m eager to learn.
> >>
> >> Regards,
> >> Ankit Kumar Pandey
> >
> > Hi all,
> >
> > I know everyone is busy with holidays (well, Happy Holidays!) but I
> > will be glad if someone can take a quick look at this PoC and share
> > thoughts.
> >
> > This is my first time contribution so I am pretty sure there will be
> > some very obvious feedbacks (which will help me to move forward with
> > this change).
> >
> >
> Updated patch with latest master. Last patch was an year old.
>
Attaching patch with rebase from latest HEAD
Thanks,
Ankit
| Attachment | Content-Type | Size |
|---|---|---|
| distinct_windows.patch | text/x-patch | 15.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ankit Kumar Pandey | 2023-03-12 08:17:50 | Re: [PoC] Implementation of distinct in Window Aggregates |
| Previous Message | Thomas Munro | 2023-03-12 07:18:30 | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED |