Re: How Big is Too Big for Tables?

From: Vincenzo Romano <vincenzo(dot)romano(at)notorand(dot)it>
To: P Kishor <punk(dot)kish(at)gmail(dot)com>
Cc: jd(at)commandprompt(dot)com, Bill Thoen <bthoen(at)gisnet(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: How Big is Too Big for Tables?
Date: 2010-07-28 17:44:33
Message-ID: AANLkTimjiQqdH3O1zJh_k46iSYt6jLs3EMPy3TPcqdbz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2010/7/28 P Kishor <punk(dot)kish(at)gmail(dot)com>:
...
> Two. Partitioning is not the perfect solution. My database will
> ultimately have about 13 million rows per day (it is daily data) for
> about 25 years. So, I need either --
>
> - One big table with 25 * 365 * 13 million rows. Completely undoable.
> - 25 yearly tables with 365 * 13 million rows each. Still a huge
> chore, very slow queries.
> - 25 * 365 tables with 13 million rows each. More doable, but
> partitioning doesn't work.
>
> Three. At least, in my case, the overhead is too much. My data are
> single bytes, but the smallest data type in Pg is smallint (2 bytes).
> That, plus the per row overhead adds to a fair amount of overhead.
>
> I haven't yet given up on storing this specific dataset in Pg, but am
> reconsidering. It is all readonly data, so flat files might be better
> for me.
>
> In other words, Pg is great, but do tests, benchmark, research before
> committing to a strategy. Of course, since you are storing geometries,
> Pg is a natural choice for you. My data are not geometries, so I can
> explore alternatives for it, while keeping my geographic data in Pg.

That recalls me an old inquiry of mine on the list about "enterprise
grade" (or whatever you want to call it) solutions.
That means, "really lots of rows" or, alternatively "really lots of tables in
the hierarchy" or, again, "really lots of partial indexes".

Partitioning is not going to work probably because coping with
thousands of tables in a hierarchy would hit against some "linear"
algorithm inside the query planner, even with constraint exclusion.

Maybe "multilevel" hierarchy (let's say partitioning by months (12)
on the first level *and* by day (28,29,30 or 31) on the second one)
would do the magics, but here the DDL would be quite killing,
even with some PL/PGSQL helper function.

The "linearity" of the index selection killed the performances also in
the "really lots of partial indexes" approach.

--
NotOrAnd Information Technologies
Vincenzo Romano
--
NON QVIETIS MARIBVS NAVTA PERITVS

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-07-28 17:46:54 Re: Comparison of Oracle and PostgreSQL full text search
Previous Message Bryan Hinton 2010-07-28 17:40:16 Re: How Big is Too Big for Tables?