From: | Rikard Pavelic <rikard(at)ngs(dot)hr> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: limit is sometimes not pushed in view with order |
Date: | 2013-04-14 10:34:14 |
Message-ID: | 20130414123414.00000718@unknown |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sat, 13 Apr 2013 20:08:16 +0200
Rikard Pavelic <rikard(at)ngs(dot)hr> wrote:
> While one could argue that optimizer doesn't know to optimize left
> join with group by its primary key, you can replace that join with
> some other joins (ie left join to another table pk) and the same
> behavior will be displayed (joining all tables and applying limit at
> the end).
> That's why I asked if fence for pushing limit is a known behavior.
While I can work around that problem by pushing left join in select
subquery this doesn't solve all problems since limit is not pushed
down on nontrivial queries.
This is probably the best example:
create table big_table(i serial primary key, delay int);
create function some_calculation(i int) returns int as $$
begin
perform pg_sleep(i);
return i*i;
end $$ language plpgsql stable cost 100000;
create view big_view
as select t.i, some_calculation(t.delay) as calc, s.delay as d2
from big_table t
left join big_table s on t.i = s.i + s.i
order by t.i asc;
insert into big_table select i, i%5 from generate_series(1, 100000) i;
analyze big_table;
explain analyze select * from big_view v where i >= 100 and i <= 105 limit 1;
Limit (cost=3201.63..3201.64 rows=1 width=12) (actual
time=10017.471..10017.471 rows=1 loops=1) -> Sort (cost=3201.63..3201.64 rows=5 width=12) (actual time=10017.469..10017.469 rows=1 loops=1)
Sort Key: t.i
Sort Method: top-N heapsort Memory: 25kB
-> Hash Right Join (cost=8.52..3201.57 rows=5 width=12) (actual time=0.078..10017.436 rows=6 loops=1)
Hash Cond: ((s.i + s.i) = t.i)
-> Seq Scan on big_table s (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..6.294 rows=100000 loops=1)
-> Hash (cost=8.46..8.46 rows=5 width=8) (actual time=0.012..0.012 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using big_table_pkey on big_table t (cost=0.00..8.46 rows=5 width=8) (actual time=0.007..0.008 rows=6 loops=1)
Index Cond: ((i >= 100) AND (i <= 105))
Total runtime: 10017.514 ms
explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 1;
Limit (cost=0.00..2391.22 rows=1 width=12) (actual time=0.088..0.088 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..23780547.26 rows=9945 width=12) (actual time=0.087..0.087 rows=1 loops=1)
Join Filter: (t.i = (s.i + s.i))
Rows Removed by Join Filter: 49
-> Index Scan using big_table_pkey on big_table t (cost=0.00..359.26 rows=9945 width=8) (actual time=0.014..0.014 rows=1 loops=1)
Index Cond: ((i >= 100) AND (i <= 10005))
-> Materialize (cost=0.00..2334.00 rows=100000 width=8) (actual time=0.009..0.020 rows=50 loops=1)
-> Seq Scan on big_table s (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.005..0.010 rows=50 loops=1)
Total runtime: 0.122 ms
explain analyze select * from big_view v where i >= 100 and i <= 10005 limit 10;
takes too long...
To me this looks like it should be fixable if limit is applied before
all targets are evaluated.
If I remove the left join from the view, Postgres works as expected,
so I guess it already knows how to apply limit before selects, but this
is probably missing for subqueries where targets are pulled out of it.
Maybe this is a problem since you would probably need closures to pull
of a general solution, but there are plenty of use cases without group
by that would benefit from applying limit before evaluating targets
that are used only in topmost result.
So, I was wondering if this is a known problem and is there any
interest in tackling it?
Regards,
Rikard
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2013-04-14 22:34:40 | Re: Segment best size |
Previous Message | Scott Marlowe | 2013-04-14 05:20:56 | Re: Segment best size |