Quick Links

Partitioned Tables and ORDER BY

From:	Joe Uhl <joeuhl(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Partitioned Tables and ORDER BY
Date:	2009-10-08 15:33:00
Message-ID:	4ACE062C.4010209@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

We have been using partitioning for some time with great success. Up
until now our usage has not included ordering and now that we are trying
to use an ORDER BY against an indexed column a rather significant
shortcoming seems to be kicking in.

Parent table (have cut all but 4 columns to make it easier to post about)
CREATE TABLE people
(
person_id character varying(36) NOT NULL,
list_id character varying(36) NOT NULL,
first_name character varying(255),
last_name character varying(255),
CONSTRAINT people_pkey (person_id, list_id)
);

A partition looks like this:
CREATE TABLE people_list1
(
-- inherited columns omitted
CONSTRAINT people_list1_list_id_check CHECK (list_id::text =
'the_unique_list_id'::text)
)
INHERITS (people);

Both the parent and the children have indexes on all 4 columns mentioned
above. The parent table is completely empty.

If I run this query, directly against the partition, performance is
excellent:
select * from people_list1 order by first_name asc limit 50;

The explain analyze output:
Limit (cost=0.00..4.97 rows=50 width=34315) (actual
time=49.616..522.464 rows=50 loops=1)
-> Index Scan using idx_people_first_name_list1 on people_list1
(cost=0.00..849746.98 rows=8544854 width=34315) (actual
time=49.614..522.424 rows=50 loops=1)
Total runtime: 522.773 ms

If I run this query, against the parent, performance is terrible:
select * from people where list_id = 'the_unique_list_id' order by
first_name asc limit 50;

The explain analyze output:
Limit (cost=726844.88..726845.01 rows=50 width=37739) (actual
time=149864.869..149864.884 rows=50 loops=1)
-> Sort (cost=726844.88..748207.02 rows=8544855 width=37739)
(actual time=149864.868..149864.876 rows=50 loops=1)
Sort Key: public.people.first_name
Sort Method: top-N heapsort Memory: 50kB
-> Result (cost=0.00..442990.94 rows=8544855 width=37739)
(actual time=0.081..125837.332 rows=8545138 loops=1)
-> Append (cost=0.00..442990.94 rows=8544855
width=37739) (actual time=0.079..111103.743 rows=8545138 loops=1)
-> Index Scan using people_pkey on people
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.008..0.008 rows=0
loops=1)
Index Cond: ((list_id)::text =
'the_unique_list_id'::text)
-> Seq Scan on people_list1 people
(cost=0.00..442986.67 rows=8544854 width=34315) (actual
time=0.068..109781.308 rows=8545138 loops=1)
Filter: ((list_id)::text =
'the_unique_list_id'::text)
Total runtime: 149865.411 ms

Just to show that partitions are setup correctly, this query also has
excellent performance:
select * from people where list_id = 'the_unique_list_id' and first_name
= 'JOE';

Here is the explain analyze for that:
Result (cost=0.00..963.76 rows=482 width=37739) (actual
time=6.031..25.394 rows=2319 loops=1)
-> Append (cost=0.00..963.76 rows=482 width=37739) (actual
time=6.029..21.340 rows=2319 loops=1)
-> Index Scan using idx_people_first_name on people
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.010..0.010 rows=0
loops=1)
Index Cond: ((first_name)::text = 'JOE'::text)
Filter: ((list_id)::text = 'the_unique_list_id'::text)
-> Bitmap Heap Scan on people_list1 people
(cost=8.47..959.49 rows=481 width=34315) (actual time=6.018..20.968
rows=2319 loops=1)
Recheck Cond: ((first_name)::text = 'JOE'::text)
Filter: ((list_id)::text = 'the_unique_list_id'::text)
-> Bitmap Index Scan on idx_people_first_name_list1
(cost=0.00..8.35 rows=481 width=0) (actual time=5.566..5.566 rows=2319
loops=1)
Index Cond: ((first_name)::text = 'JOE'::text)
Total runtime: 25.991 ms

This is Postgres 8.3.7 on the 2.6.28 kernel with constraint_exclusion
on. Our partitions are in the 8 - 15 million row range.

I realize one option is to hit the partition directly instead of hitting
the parent table with the check constraint in the WHERE clause, but up
until now we have been able to avoid needing partition-awareness in our
code. Perhaps we have hit upon something that will require breaking
that cleanliness but wanted to see if there were any workarounds.

Responses

Re: Partitioned Tables and ORDER BY at 2009-10-11 14:30:33 from Michal Szymanski

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tory M Blue	2009-10-08 17:27:30	concurrent reindex issues
Previous Message	Tom Lane	2009-10-08 15:08:54	Re: position in DDL of columns used in indexes