Quick Links

Re: [PoC] Implementation of distinct in Window Aggregates

From:	Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>
Subject:	Re: [PoC] Implementation of distinct in Window Aggregates
Date:	2023-01-04 12:40:32
Message-ID:	706f4939-b905-caf6-d6e1-49065e0f5c6a@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 29/12/22 20:58, Ankit Kumar Pandey wrote:
>
> On 24/12/22 18:22, Ankit Pandey wrote:
>> Hi,
>>
>> This is a PoC patch which implements distinct operation in window
>> aggregates (without order by and for single column aggregation, final
>> version may vary wrt these limitations). Purpose of this PoC is to
>> get feedback on the approach used and corresponding implementation,
>> any nitpicking as deemed reasonable.
>>
>> Distinct operation is mirrored from implementation in nodeAgg.
>> Existing partitioning logic determines if row is in partition and
>> when distinct is required, all tuples for the aggregate column are
>> stored in tuplesort. When finalize_windowaggregate gets called,
>> tuples are sorted and duplicates are removed, followed by calling the
>> transition function on each tuple.
>> When distinct is not required, the above process is skipped and the
>> transition function gets called directly and nothing gets inserted
>> into tuplesort.
>> Note: For each partition, in tuplesort_begin and tuplesort_end is
>> involved to rinse tuplesort, so at any time, max tuples in tuplesort
>> is equal to tuples in a particular partition.
>>
>> I have verified it for interger and interval column aggregates (to
>> rule out obvious issues related to data types).
>>
>> Sample cases:
>>
>> create table mytable(id int, name text);
>> insert into mytable values(1, 'A');
>> insert into mytable values(1, 'A');
>> insert into mytable values(5, 'B');
>> insert into mytable values(3, 'A');
>> insert into mytable values(1, 'A');
>>
>> select avg(distinct id) over (partition by name) from mytable;
>> avg
>> --------------------
>> 2.0000000000000000
>> 2.0000000000000000
>> 2.0000000000000000
>> 2.0000000000000000
>> 5.0000000000000000
>>
>> select avg(id) over (partition by name) from mytable;
>> avg
>> --------------------
>> 1.5000000000000000
>> 1.5000000000000000
>> 1.5000000000000000
>> 1.5000000000000000
>> 5.0000000000000000
>>
>> select avg(distinct id) over () from mytable;
>> avg
>> --------------------
>> 3.0000000000000000
>> 3.0000000000000000
>> 3.0000000000000000
>> 3.0000000000000000
>> 3.0000000000000000
>>
>> select avg(distinct id) from mytable;
>> avg
>> --------------------
>> 3.0000000000000000
>>
>> This is my first-time contribution. Please let me know if anything
>> can be
>> improved as I`m eager to learn.
>>
>> Regards,
>> Ankit Kumar Pandey
>
> Hi all,
>
> I know everyone is busy with holidays (well, Happy Holidays!) but I
> will be glad if someone can take a quick look at this PoC and share
> thoughts.
>
> This is my first time contribution so I am pretty sure there will be
> some very obvious feedbacks (which will help me to move forward with
> this change).
>
>
Updated patch with latest master. Last patch was an year old.

--
Regards,
Ankit Kumar Pandey

Attachment	Content-Type	Size
v1-0002-Implement-distinct-in-Window-Aggregates.patch	text/x-patch	15.5 KB

In response to

Re: [PoC] Implementation of distinct in Window Aggregates at 2022-12-29 15:28:58 from Ankit Kumar Pandey

Responses

Re: [PoC] Implementation of distinct in Window Aggregates at 2023-03-12 07:25:48 from Ankit Kumar Pandey

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Borisov	2023-01-04 12:42:34	Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message	vignesh C	2023-01-04 12:19:41	Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit