Quick Links

Re: Partitioned table performance

From:	"Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, Greg Stark <gsstark(at)mit(dot)edu>, Stacy White <harsh(at)computer(dot)org>
Subject:	Re: Partitioned table performance
Date:	2004-12-21 23:11:55
Message-ID:	20041221231155.GV18180@decibel.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote:
> Greg,
>
> > Well Oracle has lots of partitioning intelligence pushed up to the planner
> > to avoid overhead.
> >
> > If you have a query with something like "WHERE date = '2004-01-01'" and
> > date is your partition key (even if it's a range) then Oracle will figure
> > out which partition it will need at planning time.
>
> Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning.
> Maybe you could help?

This is something I've been thinking about doing for
http://stats.distributed.net; is there a formal project for this
somewhere?

On a different note, has anyone looked at the savings you get by
ommitting the partition field from the child tables? ISTM that the
savings would be substantial for narrow tables. Of course that most
likely means doing a union view instead of inheritence, but I'm guessing
here. The table I'm thinking of partitioning is quite narrow (see
below), so I suspect that dropping project_id out would result in a
substantial savings (there's basically nothing that ever queries across
the whole table). With the data distribution, I suspect just breaking
project ID's 205, 5, and 25 into partitioned tables that didn't contain
project_id would save about 450M (4bytes * 95% * 130M).

(the table has ~130M rows)

Table "public.email_contrib"
Column | Type | Modifiers
------------+---------+-----------
project_id | integer | not null
id | integer | not null
date | date | not null
team_id | integer |
work_units | bigint | not null
Indexes:
"email_contrib_pkey" primary key, btree (project_id, id, date)
"email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
"email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
"email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
"email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
"fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
"fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE

stats=# select * from pg_stats where tablename='email_contrib' and
attname='project_id';
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+---------------+------------+-----------+-----------+------------+-------------------+---------------------------------------------------------+------------------+-------------
public | email_contrib | project_id | 0 | 4 | 6 | {205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | | 0.703936
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Re: Partitioned table performance at 2004-12-15 19:56:40 from Josh Berkus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Thomas Wegner	2004-12-22 00:25:45	Re: Speed in V8.0
Previous Message	Thomas Wegner	2004-12-21 23:03:18	Speed in V8.0