From: | Curt Sampson <cjs(at)cynic(dot)net> |
---|---|
To: | Curtis Faith <curtis(at)galtcapital(dot)com> |
Cc: | 'Josh Berkus' <josh(at)agliodbs(dot)com>, 'Noah Silverman' <noah(at)allresearch(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: One large v. many small |
Date: | 2003-02-01 05:07:00 |
Message-ID: | Pine.NEB.4.51.0302011359390.610@angelic.cynic.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
On Fri, 31 Jan 2003, Curtis Faith wrote:
> Depending on the way the records are accessed and the cache size,
> the exact opposite could be true. The index pages will most likely
> rarely be in memory when you have 3000 different tables. Meaning
> that each search will require at least three or four index page
> retrievals plus the tuple page.
Assuming you're using indexes at all. If you're tending to use table
scans, this doesn't matter.
From Noah's description it seemed he was--he said that a particular data
item couldn't be the primary key, presumably because he couldn't index
it reasonably. But this just my guess, not a fact.
> Combine a multi-part index (on both client and foo, which order
> would depend on the access required) that is clustered once a week
> or so using the admittedly non-optimal PostgreSQL CLUSTER command
> and I'll bet you can get equivalent or better performance...
I would say that, just after a CLUSTER, you're likely to see better
performance because this would have the effect, on a FFS or similar
filesystem where you've got plenty of free space, of physically
clustering data that would not have been clustered in the case of a lot
of small tables that see a lot of appending evenly over all of them over
the course of time.
So the tradeoff there is really, can you afford the time for the CLUSTER?
(In a system where you have a lot of maintenance time, probably. Though if
it's a huge table, this might need an entire weekend. In a system that needs
to be up 24/7, probably not, unless you have lots of spare I/O capacity.)
Just out of curiousity, how does CLUSTER deal with updates to the table while
the CLUSTER command is running?
> I don't think there is any substitute for just trying it out. It
> shouldn't be that hard to create a bunch of SQL statements that
> concatenate the tables into one large one.
I entirely agree! There are too many unknowns here to do more than
speculate on this list.
But thanks for enlightening me on situations where one big table perform
better.
cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew V. | 2003-02-01 05:33:35 | Re: paradigm sanity check needed [long] |
Previous Message | Andrew McMillan | 2003-02-01 04:43:48 | Re: [NOVICE] For each record in SELECT |
From | Date | Subject | |
---|---|---|---|
Next Message | Sean Chittenden | 2003-02-01 05:09:35 | Re: [PERFORM] not using index for select min(...) |
Previous Message | Tom Lane | 2003-02-01 04:35:32 | Re: [PERFORM] not using index for select min(...) |