Re: FW: Constraint exclusion in partitions

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Daniel Begin <jfd553(at)hotmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: FW: Constraint exclusion in partitions
Date: 2015-05-23 10:53:29
Message-ID: CA+bJJbzGvwsq_i62SzryB8G+b75WydE+wvLwfz=fVgQYo9W4pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Daniel

On Fri, May 22, 2015 at 7:21 PM, Daniel Begin <jfd553(at)hotmail(dot)com> wrote:
> I have split a large table (billions of records) into multiple partitions,
> hoping the access would be faster. I used an ID to make partitions check
> (check (id >= 100 AND id < 200)…) and created over 80 tables (children) that
> are now filled with data.
...
> I understand that the following query will use constraint exclusion and will
> run faster…
> a- Select * from parent_table where id >=9999; -- using a constant
> But how constraint exclusion would react with the following queries …
> b- Select * from parent_table where id between 2345 and 6789; -- using
> a range of ids
> c- Select * from parent_table where id in(select ids from
> anothertable); -- using a list of ids from a select

Given you have already partitioned it, why don't you just use explain
[ analyze ] on the queries? I.e., in one of my tables, partitioned
monthly by a timestamp ( with time zone ) field I get ( even if I
never use between, a closed interval, for a continuous like type like
timestamp, the optimizer clearly shows it's transfroaming it to the
equivalent AND condition ):

explain select * from carrier_cdrs where setup between
'20150107T123456' and '20150322T222222';

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..82202.41 rows=2346599 width=74)
-> Seq Scan on carrier_cdrs (cost=0.00..0.00 rows=1 width=184)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201501 (cost=0.00..30191.10
rows=816551 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201502 (cost=0.00..25277.45
rows=872830 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201503 (cost=0.00..26733.85
rows=657217 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
(9 rows)

> Since I mostly use queries of type b and c, I am wondering if partitioning
> the large table was appropriate and if the queries are going to be longer to
> run…

The problem is gonna be all the extra conditions, so I'll check real
queries. My bet is in a query EXACTLY like b it will use constraint
exclusion, and can be potentially faster if your interval are for just
100 ids, but why speculate when YOU can measure?

Also think if you touch 67-23+1=45 partitions and the DB has to check
other indexed fields it is a lot of work. As I said, it depend on your
actual data, actual indexes, and actual queries, just measure it. In
my excample table I partition the data by the TS, which is the single
indexed field and my tests showed it was faster that way ( but my
queries are normally big scans of date ranges or more selective ones
with narrow TS conditions, and I measured them ).

Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message twoflower 2015-05-23 11:23:09 Server tries to read a different config file than it is supposed to
Previous Message Arjen Nienhuis 2015-05-23 07:54:11 Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?