Re: FW: Constraint exclusion in partitions

From: Daniel Begin <jfd553(at)hotmail(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: FW: Constraint exclusion in partitions
Date: 2015-05-23 18:37:25
Message-ID: COL129-DS8E893F1B0C162429C355B94CF0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed.

Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary...

Using a constant id:
All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...)

Using a range of ids:
Surprisingly again, all the queries I tried took longer on the partitioned table!

Using a list of ids from a select clause:
More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one!

Guess what, I will get back to my old fat table unless someone tells me I missed something obvious!
Daniel

Note: Tables/indexes description, queries and execution plans are below.

Tables/indexes description ----------------------------------------------------------------------------------
The original table has 3870130000 records. Primary key/index on each partition queries are
ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version);
CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

The partitioned table has 3870130000 records distributed over 87 partitions. Primary key/index on each partition queries are
ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version);
CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
Where xx is the partition's number suffix

constant id -------------------------------------------------------------------------------------------------------
select * from oldtable where id=123456789;
"Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66)"
" Index Cond: (id = 123456789::bigint)"
--Total query runtime: 62 ms. 1 rows retrieved

select * from newtable where id=123456789;
"Append (cost=0.00..20.19 rows=5 width=66)"
" -> Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66)"
" Filter: (id = 123456789::bigint)"
" -> Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66)"
" Index Cond: (id = 123456789::bigint)"
--Total query runtime: 156 ms. 1 rows retrieved

I got similar results for multiple records...
select * from oldtable where id IN(10000000,1000000000,2000000000,3000000000);
"Index Scan using oldtable_idversion_pk on oldtable (cost=0.71..17739.18 rows=6726 width=66)"
" Index Cond: (id = ANY ('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
--Total query runtime: 187 ms. 4 rows retrieved

select * from newtable where id IN(10000000,1000000000,2000000000,3000000000);
"Append (cost=0.00..933.40 rows=223 width=66)"
" -> Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66)"
" Filter: (id = ANY ('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
" -> Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.57..622.78 rows=156 width=66)"
" Index Cond: (id = ANY ('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
...
" -> Index Scan using newtable85_idversion_pk on newtable_85 (cost=0.57..53.37 rows=9 width=66)"
" Index Cond: (id = ANY ('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
--Total query runtime: 421 ms. 4 rows retrieved

range of ids -------------------------------------------------------------------------------------------------------
select * from oldtable where id between 1522999949 and 1523000049;
"Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..383.51 rows=144 width=66)"
" Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
Total query runtime: 47 ms. 53 rows retrieved.

select * from newtable where id between 1522999949 and 1523000049;
"Append (cost=0.00..408.16 rows=104 width=66)"
" -> Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66)"
" Filter: ((id >= 1522999949) AND (id <= 1523000049))"
" -> Index Scan using newtable51_idversion_pk on newtable_51 (cost=0.56..183.52 rows=46 width=66)"
" Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
" -> Index Scan using newtable52_idversion_pk on newtable_52 (cost=0.56..224.64 rows=57 width=66)"
" Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
Total query runtime: 78 ms. 53 rows retrieved.

list of ids from a select clause -------------------------------------------------------------------------------------------------------
--Subset provides 4 ids similar but not identical to the previous query
select * from oldtable where id IN (select * from subset);
"Nested Loop (cost=37.45..886298.00 rows=2028512050 width=66)"
" -> HashAggregate (cost=36.75..38.75 rows=200 width=8)"
" -> Seq Scan on subset (cost=0.00..31.40 rows=2140 width=8)"
" -> Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4414.37 rows=1693 width=66)"
" Index Cond: (id = subset.id)"
Total query runtime: 171 ms. 4 rows retrieved.

select * from newtable where id IN (select * from subset)
"Nested Loop (cost=36.75..1407672.76 rows=1935067087 width=66)"
" -> HashAggregate (cost=36.75..38.75 rows=200 width=8)"
" -> Seq Scan on subset (cost=0.00..31.40 rows=2140 width=8)"
" -> Append (cost=0.00..7020.68 rows=1749 width=66)"
" -> Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66)"
" Filter: (subset.id = id)"
" -> Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.56..151.97 rows=39 width=66)"
" Index Cond: (id = subset.id)"
...
" -> Index Scan using newtable86_idversion_pk on newtable_86 (cost=0.56..12.42 rows=2 width=66)"
" Index Cond: (id = subset.id)"
Total query runtime: 140 ms. 4 rows retrieved.

Using an index, not the primary key ------------------------------------------------------------------------------
--Subset provides 58 group_id pointing to 5978 records in the concerned tables
select * from oldtable where group_id IN (select * from subset)
"Nested Loop (cost=37.33..21575715.89 rows=2028512050 width=66)"
" -> HashAggregate (cost=36.75..38.75 rows=200 width=8)"
" -> Seq Scan on subset (cost=0.00..31.40 rows=2140 width=8)"
" -> Index Scan using oldtable_groupid_idx on oldtable (cost=0.58..107364.99 rows=51340 width=66)"
" Index Cond: (group_id = subset.id)"
Total query runtime: 3986 ms. 5978 rows retrieved.

select * from newtable where group_id IN (select * from subset)
"Hash Join (cost=41.25..138092255.85 rows=1935067087 width=66)"
" Hash Cond: (newtable.group_id = subset.id)"
" -> Append (cost=0.00..84877869.72 rows=3870134173 width=66)"
" -> Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66)"
" -> Seq Scan on newtable_01 (cost=0.00..946235.96 rows=46526896 width=66)"
...
" -> Seq Scan on newtable_86 (cost=0.00..986527.64 rows=44269664 width=66)"
" -> Hash (cost=38.75..38.75 rows=200 width=8)"
" -> HashAggregate (cost=36.75..38.75 rows=200 width=8)"
" -> Seq Scan on subset (cost=0.00..31.40 rows=2140 width=8)"
Execution Cancelled after 766702 ms !

I tried the same with "SET enable_seqscan = OFF" and got an index scan of all tables;

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message melvin6925 2015-05-23 19:15:13 Re: FW: Constraint exclusion in partitions
Previous Message Noah Misch 2015-05-23 17:43:06 Re: [GENERAL] psql weird behaviour with charset encodings