Re: Redundant Result node

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redundant Result node
Date: 2024-08-22 20:17:13
Message-ID: CA+FpmFdu-ocfiyO4u81k86YHG-Jf-GtC2DToGrcnYUg2EcQ8Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:

> Hi.
>
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com>
> escreveu:
>
>> I ran into a query plan where the Result node seems redundant to me:
>>
>> create table t (a int, b int, c int);
>> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
>> create index on t (a, b);
>> analyze t;
>>
>> set enable_hashagg to off;
>> set enable_seqscan to off;
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ---------------------------------------------------------
>> Result
>> Output: b, a
>> -> Unique
>> Output: a, b
>> -> Index Only Scan using t_a_b_idx on public.t
>> Output: a, b
>> (6 rows)
>>
>> What I expect is that both the Scan node and the Unique node output
>> 'b, a', and we do not need an additional projection step, something
>> like:
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ---------------------------------------------------
>> Unique
>> Output: b, a
>> -> Index Only Scan using t_a_b_idx on public.t
>> Output: b, a
>> (4 rows)
>>
>> I looked into this a little bit and found that in function
>> create_ordered_paths, we decide whether a projection step is needed
>> based on a simple pointer comparison between sorted_path->pathtarget
>> and final_target.
>>
>> /* Add projection step if needed */
>> if (sorted_path->pathtarget != target)
>> sorted_path = apply_projection_to_path(root, ordered_rel,
>> sorted_path, target);
>>
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer. Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>>
> Could memcmp solve this?
>
> With patch attached, using memcmp to compare the pointers.
>
> select distinct b, a from t order by a, b;
> QUERY PLAN
> ----------------------------------
> Sort
> Output: b, a
> Sort Key: t.a, t.b
> -> HashAggregate
> Output: b, a
> Group Key: t.a, t.b
> -> Seq Scan on public.t
> Output: a, b, c
> (8 rows)
>
> attached patch for consideration.
>
> best regards,
> Ranier Vilela
>

+1 for the idea of removing this redundant node.
I had a look in this patch, and I was wondering if we still need
sorted_path->pathtarget != target in the condition.

Apart from that,
- if (sorted_path->pathtarget != target)
+ if (sorted_path->pathtarget != target &&
+ memcmp(sorted_path->pathtarget, target,
sizeof(PathTarget)) != 0)
An extra space is there, please fix it.

Some regression tests should be added for this.

--
Regards,
Rafia Sabih

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-08-22 21:13:15 Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Previous Message Bruce Momjian 2024-08-22 20:07:12 Re: Partial aggregates pushdown