| From: | "Jim C(dot) Nasby" <decibel(at)decibel(dot)org> | 
|---|---|
| To: | Josh Berkus <josh(at)agliodbs(dot)com> | 
| Cc: | pgsql-performance(at)postgresql(dot)org, Greg Stark <gsstark(at)mit(dot)edu>, Stacy White <harsh(at)computer(dot)org> | 
| Subject: | Re: Partitioned table performance | 
| Date: | 2004-12-21 23:11:55 | 
| Message-ID: | 20041221231155.GV18180@decibel.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote:
> Greg,
> 
> > Well Oracle has lots of partitioning intelligence pushed up to the planner
> > to avoid overhead.
> >
> > If you have a query with something like "WHERE date = '2004-01-01'" and
> > date is your partition key (even if it's a range) then Oracle will figure
> > out which partition it will need at planning time.
> 
> Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning.   
> Maybe you could help?
This is something I've been thinking about doing for
http://stats.distributed.net; is there a formal project for this
somewhere?
On a different note, has anyone looked at the savings you get by
ommitting the partition field from the child tables? ISTM that the
savings would be substantial for narrow tables. Of course that most
likely means doing a union view instead of inheritence, but I'm guessing
here. The table I'm thinking of partitioning is quite narrow (see
below), so I suspect that dropping project_id out would result in a
substantial savings (there's basically nothing that ever queries across
the whole table). With the data distribution, I suspect just breaking
project ID's 205, 5, and 25 into partitioned tables that didn't contain
project_id would save about 450M (4bytes * 95% * 130M).
(the table has ~130M rows)
   Table "public.email_contrib"
   Column   |  Type   | Modifiers 
------------+---------+-----------
 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer | 
 work_units | bigint  | not null
Indexes:
    "email_contrib_pkey" primary key, btree (project_id, id, date)
    "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
    "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
    "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
    "email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
    "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
    "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE
stats=# select * from pg_stats where tablename='email_contrib' and
attname='project_id';
 schemaname |   tablename   |  attname   | null_frac | avg_width | n_distinct | most_common_vals  |                    most_common_freqs | histogram_bounds | correlation 
 ------------+---------------+------------+-----------+-----------+------------+-------------------+---------------------------------------------------------+------------------+-------------
  public     | email_contrib | project_id |         0 |         4 | 6 | {205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | |    0.703936
-- 
Jim C. Nasby, Database Consultant               decibel(at)decibel(dot)org 
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Wegner | 2004-12-22 00:25:45 | Re: Speed in V8.0 | 
| Previous Message | Thomas Wegner | 2004-12-21 23:03:18 | Speed in V8.0 |