Re: Question about partitioned query behavior

From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: "Ranga Gopalan" <ranga_gopalan(at)hotmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Question about partitioned query behavior
Date: 2010-07-02 16:40:03
Message-ID: F4E6A2751A2823418A21D4A160B68988614694@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In postgresql.conf, what are your settings for constraint_exclusion?

There are 3 settings - on, off, or partition.

Mine are set as follows:

constraint_exclusion = on # on, off, or partition

Under 8.4.4 I had it set to partition, but the behavior was not what I
expected so I set it back to "on" and only the applicable partitions get
processed.

From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Ranga
Gopalan
Sent: Friday, July 02, 2010 9:29 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Question about partitioned query behavior

Hi,

My question is regarding ORDER BY / LIMIT query behavior when using
partitioning.

I have a large table (about 100 columns, several million rows)
partitioned by a column called day (which is the date stored as yyyymmdd
- say 20100502 for May 2nd 2010 etc.). Say the main table is called
FACT_TABLE and each child table is called FACT_TABLE_yyyymmdd (e.g.
FACT_TABLE_20100502, FACT_TABLE_20100503 etc.) and has an appropriate
CHECK constraint created on it to CHECK (day = yyyymmdd).

Postgres Version: PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu,
compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit

The query pattern I am looking at is (I have tried to simplify the
column names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507 # selecting for a week
ORDER BY F2 desc
LIMIT 100

This is what is happening:

When I query from the specific day's (child) table, I get what I expect
- a descending Index scan and good performance.

# explain select F1 from FACT_TABLE_20100502 where day = 20100502 order
by F2 desc limit 100;

QUERY PLAN


------------------------------------------------------------------------
------------------------------------------------------------------------
--
Limit (cost=0.00..4.81 rows=100 width=41)
-> Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
(cost=0.00..90355.89 rows=1876985 width=41
)
Filter: (day = 20100502)

BUT:

When I do the same query against the parent table it is much slower -
two things seem to happen - one is that the descending scan of the index
is not done and secondly there seems to be a separate sort/limit at the
end - i.e. all data from all partitions is retrieved and then sorted and
limited - This seems to be much less efficient than doing a descending
scan on each partition and limiting the results and then combining and
reapplying the limit at the end.

explain select F1 from FACT_TABLE where day = 20100502 order by F2 desc
limit 100;

QUERY PLAN


------------------------------------------------------------------------
------------------------------------------------------------------------
---
Limit (cost=20000084948.01..20000084948.01 rows=100 width=41)
-> Sort (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
Sort Key: public.FACT_TABLE.F2
-> Result (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
-> Append (cost=10000000000.00..20000084230.64
rows=1876986 width=41)
-> Seq Scan on FACT_TABLE
(cost=10000000000.00..10000000010.02 rows=1 width=186)
Filter: (day = 20100502)
-> Seq Scan on FACT_TABLE_20100502 FACT_TABLE
(cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
Filter: (day = 20100502)
(9 rows)

Could anyone please explain why this is happening and what I can do to
get the query to perform well even when querying from the parent table?

Thanks,

Ranga

________________________________

Hotmail is redefining busy with tools for the New Busy. Get more from
your inbox. See how.
<http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL
:ON:WL:en-US:WM_HMP:042010_2>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-07-02 17:21:47 Re: Low perfomance SUM and Group by large databse
Previous Message Craig James 2010-07-02 16:36:23 Re: Highly Efficient Custom Sorting