Re: Redundant Result node

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redundant Result node
Date: 2024-08-23 12:29:03
Message-ID: CAEudQAp5Dwhpuj2278KfQdK6GnrZo5xg9h7X39ZSUOG9LuHsDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Rafia.

Em qui., 22 de ago. de 2024 às 17:17, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
escreveu:

>
>
> On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
>
>> Hi.
>>
>> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com>
>> escreveu:
>>
>>> I ran into a query plan where the Result node seems redundant to me:
>>>
>>> create table t (a int, b int, c int);
>>> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
>>> create index on t (a, b);
>>> analyze t;
>>>
>>> set enable_hashagg to off;
>>> set enable_seqscan to off;
>>>
>>> explain (verbose, costs off)
>>> select distinct b, a from t order by a, b;
>>> QUERY PLAN
>>> ---------------------------------------------------------
>>> Result
>>> Output: b, a
>>> -> Unique
>>> Output: a, b
>>> -> Index Only Scan using t_a_b_idx on public.t
>>> Output: a, b
>>> (6 rows)
>>>
>>> What I expect is that both the Scan node and the Unique node output
>>> 'b, a', and we do not need an additional projection step, something
>>> like:
>>>
>>> explain (verbose, costs off)
>>> select distinct b, a from t order by a, b;
>>> QUERY PLAN
>>> ---------------------------------------------------
>>> Unique
>>> Output: b, a
>>> -> Index Only Scan using t_a_b_idx on public.t
>>> Output: b, a
>>> (4 rows)
>>>
>>> I looked into this a little bit and found that in function
>>> create_ordered_paths, we decide whether a projection step is needed
>>> based on a simple pointer comparison between sorted_path->pathtarget
>>> and final_target.
>>>
>>> /* Add projection step if needed */
>>> if (sorted_path->pathtarget != target)
>>> sorted_path = apply_projection_to_path(root, ordered_rel,
>>> sorted_path, target);
>>>
>>> This does not seem right to me, as PathTargets are not canonical, so
>>> we cannot guarantee that two identical PathTargets will have the same
>>> pointer. Actually, for the query above, the two PathTargets are
>>> identical but have different pointers.
>>>
>> Could memcmp solve this?
>>
>> With patch attached, using memcmp to compare the pointers.
>>
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ----------------------------------
>> Sort
>> Output: b, a
>> Sort Key: t.a, t.b
>> -> HashAggregate
>> Output: b, a
>> Group Key: t.a, t.b
>> -> Seq Scan on public.t
>> Output: a, b, c
>> (8 rows)
>>
>> attached patch for consideration.
>>
>> best regards,
>> Ranier Vilela
>>
>
> +1 for the idea of removing this redundant node.
> I had a look in this patch, and I was wondering if we still need
> sorted_path->pathtarget != target in the condition.
>
Although the test is unnecessary, it is cheap and avoids a possible call to
memcmp.

Thanks.

best regards,
Ranier Vilela

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-08-23 12:36:48 Avoid logging enormously large messages
Previous Message Peter Eisentraut 2024-08-23 12:28:10 Re: macOS prefetching support