From: | Volker Grabsch <vog(at)notjusthosting(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Missing optimization when filters are applied after window functions |
Date: | 2012-05-16 07:50:17 |
Message-ID: | 20120516075017.GA31790@vlap |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear PostgreSQL hackers,
[ please CC to me as I'm not subscribed to the list ]
I think there is a missing optimization when a filter is
applied after a window function, where the filtered field
is also used for partitioning.
Here a simplified example: Suppose we have a table
that stores 100.000 ids, and 10 random values per id:
create table test (
id integer,
value double precision,
primary key (id, value)
);
insert into test (id, value)
select
generate_series(1, 100000) id,
random()
from
generate_series(1, 10);
We are now asking for the rank of each value per id
(it doesn't matter if it's rank() or any other window
function):
select
id,
value,
rank() over (partition by id order by value)
from
test;
Now suppose we're interested only in the result for
id 23000. If the query above was defined in a view,
this would lead to the following query:
select
*
from
(
select
id,
value,
rank() over (partition by id order by value)
from
test
) subquery
where
id = 23000;
This is very slow! The "explain" command shows the reason:
Subquery Scan on subquery (cost=0.00..86556.80 rows=5000 width=20)
Filter: (subquery.id = 23000)
-> WindowAgg (cost=0.00..74056.80 rows=1000000 width=12)
-> Index Scan using test_pkey on test (cost=0.00..56556.80 rows=1000000 width=12)
The Filter (id = 23000) is applied after WindowAgg,
although it could certainly be applied before!
Of course we could rewrite the above query like this:
select
id,
value,
rank() over (partition by id order by value)
from
test
where
id = 23000;
This would lead to a more sane plan where the filter is
applied before WindowAgg, making use of the primary key:
WindowAgg (cost=43.53..43.70 rows=10 width=12)
-> Sort (cost=43.53..43.55 rows=10 width=12)
Sort Key: value
-> Bitmap Heap Scan on test (cost=4.53..43.36 rows=10 width=12)
Recheck Cond: (id = 23000)
-> Bitmap Index Scan on test_pkey (cost=0.00..4.53 rows=10 width=0)
Index Cond: (id = 23000)
However, the second query cannot be cleanly split into
a generic view and a specialized part anymore.
In other words, it is currently impossible to use Window
functions in a View without risking serious, unexpected
performance issues, even in simple cases!
I propose the following general optimization: If all window
functions are partitioned by the same first field (here: id),
then any filter on that field should be executed before
WindowAgg. So a query like this:
select
...
from
(select
... over (partition by FIELD, other2, ...)
... over (partition by FIELD, other3, ...)
... over (partition by FIELD, ...)
...
)
where
SOME_FILTER(FIELD)
Shoule be transformed into that:
select
...
from
(select
... over (partition by FIELD, other2, ...)
... over (partition by FIELD, other3, ...)
... over (partition by FIELD, ...)
...
where
SOME_FILTER(FIELD)
)
I verfied that this issue exists with PostgreSQL 9.2 beta 1
as well as the PostgreSQL 9.1 release.
Is there any work on this issue? Or am I wrong and this
optimization should not be implemented for some reason?
Regards,
Volker
--
Volker Grabsch
---<<(())>>---
From | Date | Subject | |
---|---|---|---|
Next Message | Brar Piening | 2012-05-16 07:52:21 | 9.2 Beta: intersection of daterange |
Previous Message | Peter Geoghegan | 2012-05-16 04:41:54 | Re: Draft release notes complete |