Re: effective SELECT from child tables

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, g(at)pervasive(dot)com, mark(at)mark(dot)mielke(dot)cc, Ilia Kantor <ilia(at)obnovlenie(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: effective SELECT from child tables
Date: 2005-10-04 14:29:40
Message-ID: 20051004142940.GG40138@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 01, 2005 at 07:59:11PM +0100, Simon Riggs wrote:
> On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote:
> > To clarify, this is a hard-coded implementation of what I'm asking for:
> > http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell:
> >
> > CREATE TABLE log_other (
> > project_id smallint NOT NULL
> > ...
> > )
> >
> > CREATE TABLE log_8 (
> > -- No project_id
> > ...
> > )
> > CREATE TABLE log_24, log_25, log_5...
> > CREATE VIEW log AS
> > SELECT * FROM log_other
> > UNION ALL SELECT 8 AS project_id, * FROM log_8
> > ...
> >
> > So the end result is that for cases where project_id is 5, 8, 24, or 25,
> > the data will be stored in tables that don't have the project_id.
> >
> > If I were to use this on the main table for
> > http://stats.distributed.net, which has ~130M rows, I would be able to
> > save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb
> > will have many times that number of rows, so the savings will be even
> > larger.
> >
> > Note that this technique wouldn't help at all for something like date
> > partitioning, because you have to store the date in the partitioned
> > table.
>
> Jim,
>
> Your idea was noted before and actually; I mentioned it to show that I
> listen and take note of ideas from any source.
>
> For everybody, I would note that the current behaviour is exactly the
> way that List Partitioning works on other systems.
>
> The cost of this technique is only paid if you choose to partition on
> something that you would not otherwise have included in your table. In
> many cases, you'll choose a column that would have been in the table if
> you created one big table so the additional cost is zero.

Well, the idea is to be more space efficient than if one big table was
used. This is unique to this class of partitioning problems.

> In your example, I would expect to see project_id in a superclass table
> and so there would be no cost.

Superclass table?

> The idea is neat, but IMHO the potential saving of this idea is not big
> enough for me to prioritise that very highly over other items at this
> time.

Certainly. I only chimed in with a specific example so people could
better understand what the idea was. I know it's on the list and might
be addressed at some point. In the mean time it's not too horrible to
hard-code a solution.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-10-04 14:29:48 Re: Vacuum and Transactions
Previous Message Jim C. Nasby 2005-10-04 14:17:33 Re: \d on database with a lot of tables is slow