Re: Performance with sorting and LIMIT on partitioned table

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michal Szymanski <mich20061(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance with sorting and LIMIT on partitioned table
Date: 2009-10-19 00:52:34
Message-ID: 603c8f070910181752x76ee07d2q5bbffa28c39914e3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Oct 12, 2009 at 10:14 AM, Michal Szymanski <mich20061(at)gmail(dot)com> wrote:
> We have performance problem with query on partitioned table when query
> use order by and we want to use first/last rows from result set.
> More detail description:
> We have big table where each row is one telephone call (CDR).
> Definitnion of this table look like this:
> CREATE TABLE accounting.cdr_full_partitioned  (it is parrent table)
> (
>  cdr_id bigint NOT NULL,
>  id_crx_group_from bigint,                                             -- identifier of user
>  start_time_invite timestamp with time zone,   -- start call time
>  call_status VARCHAR                                                   -- FINF-call finished, FINC-call
> unfinished
>  ..some extra data..
> )
>
> We creating 12 partitions using 'start_time_invite' column, simply we
> create one partition for each month. We create costraints like this:
> ALTER TABLE accounting.cdr_y2009_m09
>  ADD CONSTRAINT y2009m09 CHECK (start_time_invite >= '2009-09-01
> 00:00:00+02'::timestamp with time zone AND start_time_invite <
> '2009-10-01 00:00:00+02'::timestamp with time zone);
>
> and we define necessery indexes of course
>
> CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1
>  ON accounting.cdr_full_partitioned_y2009_m09
>  USING btree
>  (id_crx_group_from, start_time_invite, call_status);
>
>
> The problem appears when we want to select calls for specified user
> with specified call_Status e.g:
>  SELECT * FROM accounting.cdr_full_partitioned
>   WHERE
>   id_crx_group_from='522921' AND
>   call_status='FINS' AND
>   start_time_invite>='2009-09-28 00:00:00+02' AND
>   start_time_invite<'2009-10-12 23:59:59+02'   AND
>  ORDER BY start_time_invite  LIMIT '100' OFFSET 0
>
> you can see execution plan  http://szymanskich.net/pub/postgres/full.jpg
>  as you see 20000 rows were selected and after were sorted what take
> very long about 30-40s and after sorting it limit
> result to 100 rows.
>
> Using table without partition
>
>  SELECT * FROM accounting.cdr_full    WHERE
> (id_crx_group_from='522921') AND (
>   call_status='FINS' ) AND (start_time_invite>='2009-01-28
> 00:00:00+02')
>   AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY
> start_time_invite  LIMIT '100' OFFSET 0
>
> execution plan is very simple
> "Limit  (cost=0.00..406.40 rows=100 width=456)"
> "  ->  Index Scan using
> cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full
> (cost=0.00..18275.76 rows=4497 width=456)"
> "        Index Cond: ((id_crx_group_from = 522921::bigint) AND
> (start_time_invite >= '2009-01-27 23:00:00+01'::timestamp with time
> zone) AND (start_time_invite < '2009-10-12 23:59:59+02'::timestamp
> with time zone) AND ((call_status)::text = 'FINS'::text))"
>
> it use index to fetch first 100 rows and it is super fast and take
> less than 0.5s. There is no rows sorting!
> I've tried to execute the same query on one partition:
>  SELECT * FROM accounting.cdr_full_partitioned_y2009_m09
>  WHERE (id_crx_group_from='509498') AND (
>   call_status='FINS' ) AND (start_time_invite>='2009-09-01
> 00:00:00+02')
>   AND (start_time_invite<'2009-10-12 23:59:59+02')
>
> You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg
> and query is superfast because there is no sorting. The question is
> how to speed up query when we use partitioning? So far I have not
> found solution. I'm wonder how do you solve problems
> when result from partition must be sorted and after we want to display
> only first/last 100 rows?
> We can use own partitioning mechanism and partitioning data using
> id_crx_group_from and create dynamic query (depending on
> id_crx_group_from we can execute query on one partition) but it is not
> most beautiful solution.

Yeah - unfortunately the query planner is not real smart about
partitioned tables yet. I can't make anything of the JPG link you
posted. Can you post the EXPLAIN ANALYZE output for the case that is
slow? What PG version is this?

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Grzegorz Jaśkiewicz 2009-10-19 08:24:45 Re: Partitioned Tables and ORDER BY
Previous Message Robert Haas 2009-10-19 00:49:42 Re: Improving join performance over multiple moderately wide tables