From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Daniel Begin <jfd553(at)hotmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: FW: Constraint exclusion in partitions |
Date: | 2015-05-23 10:53:29 |
Message-ID: | CA+bJJbzGvwsq_i62SzryB8G+b75WydE+wvLwfz=fVgQYo9W4pA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Daniel
On Fri, May 22, 2015 at 7:21 PM, Daniel Begin <jfd553(at)hotmail(dot)com> wrote:
> I have split a large table (billions of records) into multiple partitions,
> hoping the access would be faster. I used an ID to make partitions check
> (check (id >= 100 AND id < 200)…) and created over 80 tables (children) that
> are now filled with data.
...
> I understand that the following query will use constraint exclusion and will
> run faster…
> a- Select * from parent_table where id >=9999; -- using a constant
> But how constraint exclusion would react with the following queries …
> b- Select * from parent_table where id between 2345 and 6789; -- using
> a range of ids
> c- Select * from parent_table where id in(select ids from
> anothertable); -- using a list of ids from a select
Given you have already partitioned it, why don't you just use explain
[ analyze ] on the queries? I.e., in one of my tables, partitioned
monthly by a timestamp ( with time zone ) field I get ( even if I
never use between, a closed interval, for a continuous like type like
timestamp, the optimizer clearly shows it's transfroaming it to the
equivalent AND condition ):
explain select * from carrier_cdrs where setup between
'20150107T123456' and '20150322T222222';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..82202.41 rows=2346599 width=74)
-> Seq Scan on carrier_cdrs (cost=0.00..0.00 rows=1 width=184)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201501 (cost=0.00..30191.10
rows=816551 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201502 (cost=0.00..25277.45
rows=872830 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
-> Seq Scan on carrier_cdrs_201503 (cost=0.00..26733.85
rows=657217 width=74)
Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
(9 rows)
> Since I mostly use queries of type b and c, I am wondering if partitioning
> the large table was appropriate and if the queries are going to be longer to
> run…
The problem is gonna be all the extra conditions, so I'll check real
queries. My bet is in a query EXACTLY like b it will use constraint
exclusion, and can be potentially faster if your interval are for just
100 ids, but why speculate when YOU can measure?
Also think if you touch 67-23+1=45 partitions and the DB has to check
other indexed fields it is a lot of work. As I said, it depend on your
actual data, actual indexes, and actual queries, just measure it. In
my excample table I partition the data by the TS, which is the single
indexed field and my tests showed it was faster that way ( but my
queries are normally big scans of date ranges or more selective ones
with narrow TS conditions, and I measured them ).
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | twoflower | 2015-05-23 11:23:09 | Server tries to read a different config file than it is supposed to |
Previous Message | Arjen Nienhuis | 2015-05-23 07:54:11 | Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for? |