From: | Michal Szymanski <mich20061(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Performance with sorting and LIMIT on partitioned table |
Date: | 2009-10-12 14:14:37 |
Message-ID: | 0e30ebbd-82ff-4322-ba28-9707cd4dafa6@k33g2000yqa.googlegroups.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
We have performance problem with query on partitioned table when query
use order by and we want to use first/last rows from result set.
More detail description:
We have big table where each row is one telephone call (CDR).
Definitnion of this table look like this:
CREATE TABLE accounting.cdr_full_partitioned (it is parrent table)
(
cdr_id bigint NOT NULL,
id_crx_group_from bigint, -- identifier of user
start_time_invite timestamp with time zone, -- start call time
call_status VARCHAR -- FINF-call finished, FINC-call
unfinished
..some extra data..
)
We creating 12 partitions using 'start_time_invite' column, simply we
create one partition for each month. We create costraints like this:
ALTER TABLE accounting.cdr_y2009_m09
ADD CONSTRAINT y2009m09 CHECK (start_time_invite >= '2009-09-01
00:00:00+02'::timestamp with time zone AND start_time_invite <
'2009-10-01 00:00:00+02'::timestamp with time zone);
and we define necessery indexes of course
CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1
ON accounting.cdr_full_partitioned_y2009_m09
USING btree
(id_crx_group_from, start_time_invite, call_status);
The problem appears when we want to select calls for specified user
with specified call_Status e.g:
SELECT * FROM accounting.cdr_full_partitioned
WHERE
id_crx_group_from='522921' AND
call_status='FINS' AND
start_time_invite>='2009-09-28 00:00:00+02' AND
start_time_invite<'2009-10-12 23:59:59+02' AND
ORDER BY start_time_invite LIMIT '100' OFFSET 0
you can see execution plan http://szymanskich.net/pub/postgres/full.jpg
as you see 20000 rows were selected and after were sorted what take
very long about 30-40s and after sorting it limit
result to 100 rows.
Using table without partition
SELECT * FROM accounting.cdr_full WHERE
(id_crx_group_from='522921') AND (
call_status='FINS' ) AND (start_time_invite>='2009-01-28
00:00:00+02')
AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY
start_time_invite LIMIT '100' OFFSET 0
execution plan is very simple
"Limit (cost=0.00..406.40 rows=100 width=456)"
" -> Index Scan using
cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full
(cost=0.00..18275.76 rows=4497 width=456)"
" Index Cond: ((id_crx_group_from = 522921::bigint) AND
(start_time_invite >= '2009-01-27 23:00:00+01'::timestamp with time
zone) AND (start_time_invite < '2009-10-12 23:59:59+02'::timestamp
with time zone) AND ((call_status)::text = 'FINS'::text))"
it use index to fetch first 100 rows and it is super fast and take
less than 0.5s. There is no rows sorting!
I've tried to execute the same query on one partition:
SELECT * FROM accounting.cdr_full_partitioned_y2009_m09
WHERE (id_crx_group_from='509498') AND (
call_status='FINS' ) AND (start_time_invite>='2009-09-01
00:00:00+02')
AND (start_time_invite<'2009-10-12 23:59:59+02')
You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg
and query is superfast because there is no sorting. The question is
how to speed up query when we use partitioning? So far I have not
found solution. I'm wonder how do you solve problems
when result from partition must be sorted and after we want to display
only first/last 100 rows?
We can use own partitioning mechanism and partitioning data using
id_crx_group_from and create dynamic query (depending on
id_crx_group_from we can execute query on one partition) but it is not
most beautiful solution.
Michal Szymanski
http://blog.szymanskich.net
http://techblog.freeconet.pl
From | Date | Subject | |
---|---|---|---|
Next Message | Cédric Villemain | 2009-10-12 14:18:18 | Re: Best suiting OS |
Previous Message | Grzegorz Jaśkiewicz | 2009-10-12 13:40:14 | Re: Query performance |