Re: can't get rid of unnesesary SORT step in explain plan for hash join

From: Alexey Nalbat <alexey(at)price(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: can't get rid of unnesesary SORT step in explain plan for hash join
Date: 2001-05-14 07:36:17
Message-ID: 01051413453300.13379@workshop.price.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Sun, 13 May 2001, Tom Lane wrote:
> Alexey Nalbat <alexey(at)price(dot)ru> writes:
> > So, my question is: how can I get rid of this unnesesary "Sort" step
> > in the execution plan for hash join?
>
> You can't, because it's not unnecessary. Hash join doesn't promise
> to produce its outputs in any particular order. But the Unique
> filter needs to see its inputs in order by the fields being made
> unique.

Hello.

Tom, thanks for your answer. But I am not agree with it.

Here is the query of my interest:
+ select distinct c_id, m_id from products, ( select r_id from resellers where r_name like 'a%' ) as temp where m_id = 123 and products.r_id = temp.r_id order by m_id, c_id;
+
+ Unique (cost=16.83..16.83 rows=1 width=12)
+ -> Sort (cost=16.83..16.83 rows=1 width=12)
+ -> Hash Join (cost=8.16..16.82 rows=1 width=12)
+ -> Index Scan using products_mcr on products (cost=0.00..8.14 rows=10 width=8)
+ -> Hash (cost=8.14..8.14 rows=10 width=4)
+ -> Index Scan using resellers_n on resellers (cost=0.00..8.14 rows=10 width=4)

While executing this query postgres at first creates hash on table "resellers",
then get from index "products_mcr" for rows with "m_id=123" already ordered (!!!)
pairs "c_id,r_id", for each that pair it checks join condition using hash. If postgers
behaves like this then the result of this hash join is already sorted, and extra
"Sort" is not needed.

In fact query without "distinct" and "order by" clauses returns ordered values
of "c_id". I mean this query:
+ select c_id from products, ( select r_id from resellers where r_name like 'a%' ) as temp where m_id = 123 and products.r_id = temp.r_id" returns ordered values of "c_id;
+
+ Hash Join (cost=8.16..16.82 rows=1 width=12)
+ -> Index Scan using products_mcr on products (cost=0.00..8.14 rows=10 width=8)
+ -> Hash (cost=8.14..8.14 rows=10 width=4)
+ -> Index Scan using resellers_n on resellers (cost=0.00..8.14 rows=10 width=4)

And one more argument. We are now transferring our DB from Oracle to Postgres,
and in oracle this query does not have "Sort" in explain_plan, it has only "Unique":
+ select distinct c_id from products where m_id = 123 and r_id in ( select r_id from resellers where r_name like 'a%' ) order by c_id
+
+ SORT, UNIQUE
+ HASH JOIN,
+ INDEX, RANGE SCAN PRODUCTS_MCR
+ TABLE ACCESS, FULL RESELLERS

So I suppose, that this "Sort" step can be removed from the execution plan. What
do you think about this?

Thanks a lot.

--

WBR, Alexey Nalbat

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message John Scott 2001-05-14 08:16:25 unwritable /tmp causes initdb to crash
Previous Message Tom Lane 2001-05-14 00:52:39 Re: DISTINCT ON () with UNION