From: | James Rogers <jamesr(at)best(dot)com> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Index/Function organized table layout |
Date: | 2003-10-01 06:31:26 |
Message-ID: | BB9FC2CE.54C3%jamesr@best.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I am running a large and mature mission-critical PostgreSQL implementation
with multiple tables that are growing at a rate of several million rows per
month (the rate of addition is growing as well). It is a 24x7 operation, so
downtime is a party foul and apparent performance has to be reasonably good
all the time.
The problem: My working set is typically several million rows (and growing)
at any one time, which has a tendency to thrash the buffers mercilessly.
Records are inserted in an order that does not reflect typical retrieval
such that a typical query has to hit many, many blocks to collect all the
rows in a given range query. CLUSTER isn't cutting it (more below).
Definitely sub-optimal.
Now, I've actually hacked commercial MVCC engines back in the day, and am
comfortable playing around in database internals. I have an "itch to
scratch" for improving the scalability of Really Large Tables by explicitly
allowing control of table layouts as an optional property of the tables. I
would like some feedback as to whether this is practical given the current
architecture; it appears that it is, but I'm not intimately familiar with it
(yet).
1.) B-tree organized tables. The primary key index also contains the entire
row. Upside: Very fast retrieval for indexed queries over large data sets
and very efficient buffer usage. Downside: In practice, the primary key
index can be the only index on the table, and the locking is more like an
index than data page. I've actually used this feature a lot for very large
tables in Oracle implementations, and you can get substantial speed
improvements versus normal indexed tables if used correctly on large tables.
Vastly more convenient for a 24x7 operation than using the CLUSTER command,
which creates nasty contention, doesn't automatically order the table on
insert/update (i.e. it has to be run regularly), and basically ends up with
an index that is redundant.
2.) Function/Hash organized tables (aka Partitioning). Partitioning tuples
across blocks according to a user provided function is the second target on
my short list. Doing this well (i.e. being able to take individual
partitions offline or setting them read-only) takes a hell of a lot of work,
but having a table that internally clustered its records according to a user
provided hash function is a good start. This isn't a beginner feature
(stupidly designed transactions can be quite costly on partitioned tables),
but it is a very nice tool to have. True partitioning would be
enterprise-grade gold, but that target may be a bit high for now.
Both of these things really are attempts to address the same basic problem,
which is optimizing the number of buffers a given query uses by making the
tables layout reflect typical queries. Given the size of our typical
working sets, optimizing layout greatly improves performance by reducing
buffer thrashing.
I would very much like to work on improving this type of feature (automatic
table layout optimization), but I thought I would check with the people
currently hacking on the system first, to see if there was a showstopper or
if someone is already working on this.
Cheers,
-James Rogers
jamesr(at)best(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Seun Osewa | 2003-10-01 07:57:20 | Re: Possible Commit Syntax Change for Improved TPS |
Previous Message | Joshua D. Drake | 2003-10-01 04:54:07 | Re: Wednesday beta postponed till Thursday |