Re: Transparent table partitioning in future version of PG?

From: david(at)lang(dot)hm
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, henk de wit <henk53602(at)hotmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Transparent table partitioning in future version of PG?
Date: 2009-05-08 18:20:57
Message-ID: alpine.DEB.1.10.0905081043340.15782@asgard
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 8 May 2009, Robert Haas wrote:

> On Thu, May 7, 2009 at 10:52 PM, <david(at)lang(dot)hm> wrote:
>>>> Hopefully, notions of partitioning won't be directly tied to chunking of
>>>> data for parallel query access. Most queries access recent data and
>>>> hence only a single partition (or stripe), so partitioning and
>>>> parallelism and frequently exactly orthogonal.
>>>
>>> Yes, I think those things are unrelated.
>>
>> I'm not so sure (warning, I am relativly inexperianced in this area)
>>
>> it sounds like you can take two basic approaches to partition a database
>>
>> 1. The Isolation Plan
> [...]
>> 2. The Load Balancing Plan
>
> Well, even if the table is not partitioned at all, I don't see that it
> should preclude parallel query access. If I've got a 1 GB table that
> needs to be sequentially scanned for rows meeting some restriction
> clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it
> should be possible to have them each scan half of the table and
> combine the results. Now, this is not easy and there are probably
> substantial planner and executor changes required to make it work, but
> I don't know that it would be particularly easier if I had two 500 MB
> partitions instead of a single 1 GB table.
>
> IOW, I don't think you should need to partition if all you want is
> load balancing. Partitioning should be for isolation, and load
> balancing should happen when appropriate, whether there is
> partitioning involved or not.

actually, I will contridict myself slightly.

with the Isolation Plan there is not nessasarily a need to run the query
on each parition in parallel.

if parallel queries are possible, it will benifit Isolation Plan
paritioning, but the biggest win with this plan is just reducing the
number of paritions that need to be queried.

with the Load Balancing Plan there is no benifit in partitioning unless
you have the ability to run queries on each parition in parallel

using a seperate back-end process to do a query on a seperate partition is
a fairly straightforward, but not trivial thing to do (there are
complications in merging the result sets, including the need to be able to
do part of a query, merge the results, then use those results for the next
step in the query)

I would also note that there does not seem to be a huge conceptual
difference between doing these parallel queries on one computer and
shipping the queries off to other computers.

however, trying to split the work on a single table runs into all sorts of
'interesting' issues with things needing to be shared between the multiple
processes (they both need to use the same indexes, for example)

so I think that it is much easier for the database engine to efficiantly
search two 500G tables instead of one 1T table.

David Lang

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Fernando Hevia 2009-05-08 19:04:33 Re: PostgreSQL with PostGIS on embedded hardware
Previous Message Paolo Rizzi 2009-05-08 17:50:08 Re: PostgreSQL with PostGIS on embedded hardware