From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Michal Szymanski <mich20061(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance with sorting and LIMIT on partitioned table |
Date: | 2009-10-19 00:52:34 |
Message-ID: | 603c8f070910181752x76ee07d2q5bbffa28c39914e3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Oct 12, 2009 at 10:14 AM, Michal Szymanski <mich20061(at)gmail(dot)com> wrote:
> We have performance problem with query on partitioned table when query
> use order by and we want to use first/last rows from result set.
> More detail description:
> We have big table where each row is one telephone call (CDR).
> Definitnion of this table look like this:
> CREATE TABLE accounting.cdr_full_partitioned (it is parrent table)
> (
> cdr_id bigint NOT NULL,
> id_crx_group_from bigint, -- identifier of user
> start_time_invite timestamp with time zone, -- start call time
> call_status VARCHAR -- FINF-call finished, FINC-call
> unfinished
> ..some extra data..
> )
>
> We creating 12 partitions using 'start_time_invite' column, simply we
> create one partition for each month. We create costraints like this:
> ALTER TABLE accounting.cdr_y2009_m09
> ADD CONSTRAINT y2009m09 CHECK (start_time_invite >= '2009-09-01
> 00:00:00+02'::timestamp with time zone AND start_time_invite <
> '2009-10-01 00:00:00+02'::timestamp with time zone);
>
> and we define necessery indexes of course
>
> CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1
> ON accounting.cdr_full_partitioned_y2009_m09
> USING btree
> (id_crx_group_from, start_time_invite, call_status);
>
>
> The problem appears when we want to select calls for specified user
> with specified call_Status e.g:
> SELECT * FROM accounting.cdr_full_partitioned
> WHERE
> id_crx_group_from='522921' AND
> call_status='FINS' AND
> start_time_invite>='2009-09-28 00:00:00+02' AND
> start_time_invite<'2009-10-12 23:59:59+02' AND
> ORDER BY start_time_invite LIMIT '100' OFFSET 0
>
> you can see execution plan http://szymanskich.net/pub/postgres/full.jpg
> as you see 20000 rows were selected and after were sorted what take
> very long about 30-40s and after sorting it limit
> result to 100 rows.
>
> Using table without partition
>
> SELECT * FROM accounting.cdr_full WHERE
> (id_crx_group_from='522921') AND (
> call_status='FINS' ) AND (start_time_invite>='2009-01-28
> 00:00:00+02')
> AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY
> start_time_invite LIMIT '100' OFFSET 0
>
> execution plan is very simple
> "Limit (cost=0.00..406.40 rows=100 width=456)"
> " -> Index Scan using
> cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full
> (cost=0.00..18275.76 rows=4497 width=456)"
> " Index Cond: ((id_crx_group_from = 522921::bigint) AND
> (start_time_invite >= '2009-01-27 23:00:00+01'::timestamp with time
> zone) AND (start_time_invite < '2009-10-12 23:59:59+02'::timestamp
> with time zone) AND ((call_status)::text = 'FINS'::text))"
>
> it use index to fetch first 100 rows and it is super fast and take
> less than 0.5s. There is no rows sorting!
> I've tried to execute the same query on one partition:
> SELECT * FROM accounting.cdr_full_partitioned_y2009_m09
> WHERE (id_crx_group_from='509498') AND (
> call_status='FINS' ) AND (start_time_invite>='2009-09-01
> 00:00:00+02')
> AND (start_time_invite<'2009-10-12 23:59:59+02')
>
> You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg
> and query is superfast because there is no sorting. The question is
> how to speed up query when we use partitioning? So far I have not
> found solution. I'm wonder how do you solve problems
> when result from partition must be sorted and after we want to display
> only first/last 100 rows?
> We can use own partitioning mechanism and partitioning data using
> id_crx_group_from and create dynamic query (depending on
> id_crx_group_from we can execute query on one partition) but it is not
> most beautiful solution.
Yeah - unfortunately the query planner is not real smart about
partitioned tables yet. I can't make anything of the JPG link you
posted. Can you post the EXPLAIN ANALYZE output for the case that is
slow? What PG version is this?
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Grzegorz Jaśkiewicz | 2009-10-19 08:24:45 | Re: Partitioned Tables and ORDER BY |
Previous Message | Robert Haas | 2009-10-19 00:49:42 | Re: Improving join performance over multiple moderately wide tables |