From: | Jay Levitt <jay(dot)levitt(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Subquery in a JOIN not getting restricted? |
Date: | 2011-11-09 20:39:14 |
Message-ID: | 4EBAE4F2.1090403@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Kevin Grittner wrote:
> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> wrote:
>
>> I don't get why the GROUP BY in this subquery forces it to scan
>> the entire users table (seq scan here, index scan on a larger
>> table) when there's only one row in users that can match:
> Are you sure there's a plan significantly faster than 1.3 ms?
Yep! Watch this:
drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;
create table questions (
id int not null primary key,
user_id int not null
);
insert into questions
select generate_series(1,1100), (random()*2000000)::int;
create table users (
id int not null primary key
);
insert into users select generate_series(1, 2000000);
vacuum freeze analyze;
explain analyze
select questions.id
from questions
join (
select u.id
from users as u
group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;
-----------------------
Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual
time=888.787..888.790 rows=1 loops=1)
Merge Cond: (u.id = questions.user_id)
-> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual
time=0.017..735.509 rows=1747305 loops=1)
-> Index Scan using users_pkey on users u (cost=0.00..60797.47
rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1)
-> Materialize (cost=8.28..8.29 rows=1 width=8) (actual
time=0.013..0.015 rows=1 loops=1)
-> Sort (cost=8.28..8.28 rows=1 width=8) (actual
time=0.012..0.013 rows=1 loops=1)
Sort Key: questions.user_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using questions_pkey on questions
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
Index Cond: (id = 1)
Total runtime: 888.832 ms
(11 rows)
explain analyze
select questions.id
from questions
join (
select u.id
from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;
-----------------------
Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021
rows=1 loops=1)
-> Index Scan using questions_pkey on questions (cost=0.00..8.27
rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
Index Cond: (id = 1)
-> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1
width=4) (actual time=0.007..0.007 rows=1 loops=1)
Index Cond: (u.id = questions.user_id)
Total runtime: 0.045 ms
(6 rows)
> That said, there might be some room for an optimization which pushes
> that test into the query with the "group by" clause. I don't know
> if there's a problem with that which I'm missing, the construct was
> judged to be too rare to be worth the cost of testing for it, or
> it's just that nobody has yet gotten to it.
Anyone have more insights on whether this is hard to optimize or simply
not-yet-optimized? And if the latter, where might I start looking? (Not
that you -really- want me to submit a patch; my C has regressed to the "try
an ampersand. OK, try an asterisk." level...)
Jay
From | Date | Subject | |
---|---|---|---|
Next Message | Sorin Dudui | 2011-11-10 13:05:56 | IMMUTABLE STABLE functions, daily updates |
Previous Message | Kevin Grittner | 2011-11-09 20:20:00 | Re: Subquery in a JOIN not getting restricted? |