Re: select count() out of memory

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Thomas Finneid" <tfinneid(at)student(dot)matnat(dot)uio(dot)no>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: select count() out of memory
Date: 2007-10-25 21:42:40
Message-ID: dcc563d10710251442k2d688544lfaf5af155b2f6884@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/25/07, Thomas Finneid <tfinneid(at)student(dot)matnat(dot)uio(dot)no> wrote:
>
> Tom Lane wrote:
>
> > You are making a common beginner error, which is to suppose that N
> > little tables are better than one big one. They are not.
>
> Well that depends on how you define better. For my purposes, it is better.
>
> > What you're
> > effectively doing is replacing the upper levels of a big table's indexes
> > with lookups in the system catalogs, which in point of fact is a
> > terrible tradeoff from a performance standpoint.
>
> Only if you assume I use all data in all tables all the time. But as I
> have explained in other replies recently, most of the times only data
> from the newest child table is used.
>
> I did the performance tests before deciding on the design and having it
> all in one large table would not perform at all within requirements, The
> reason was that the indexes for the ever growing table would take longer
> and longer to update at each insert.
>
> When I use partitions, or child tables, I can use COPY to insert the
> data into the new chilkd table and then add the indexes to the single
> table only. That was, by far, the fastets solution.
>
> > From a database-theory standpoint, if all this data is alike then you
> > should have it all in one big table.
>
> Then, what is the point with partitions, if you can not use it to
> somehow separate logically similar data into different paritions because
> one has a need to do so? Of course I could have put it in a single
> table, had it not been for the performance. I could have used a discrete
> timestamp to separate the data, but why? partitions is more practical.
>
> > There are certain practical cases
> > where it's worth partitioning, but not at the level of granularity that
> > you are proposing.
>
> If its practical to use partitions, granularity does not come into the
> equation.

It may well be that one big table and partial indexes would do what
you want. Did you explore partial indexes against one big table?
That can be quite handy.

i.e

create table mybigtable (ts timestamp, id int primary key, row1
numeric, ..... rown numeric);
populate with a years worth of data, i.e. 100M rows or something like that
create index mybigtable_20071025 on mybigtable (id) where ts between
'2007-10-25 00:00:00' and '2007-10-25 23:59:59.99999';

repeat as needed. now, when you want something from the table, you
can just ask for it with a timestamp range and it will hit the index,
and the table, all pretty fast.

Worth a look I guess.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Stark 2007-10-25 22:17:25 Re: select count() out of memory
Previous Message Trevor Talbot 2007-10-25 21:23:37 Re: execute pg_dump via python