Index/Function organized table layout

From: James Rogers <jamesr(at)best(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Index/Function organized table layout
Date: 2003-10-01 06:31:26
Message-ID: BB9FC2CE.54C3%jamesr@best.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I am running a large and mature mission-critical PostgreSQL implementation
with multiple tables that are growing at a rate of several million rows per
month (the rate of addition is growing as well). It is a 24x7 operation, so
downtime is a party foul and apparent performance has to be reasonably good
all the time.

The problem: My working set is typically several million rows (and growing)
at any one time, which has a tendency to thrash the buffers mercilessly.
Records are inserted in an order that does not reflect typical retrieval
such that a typical query has to hit many, many blocks to collect all the
rows in a given range query. CLUSTER isn't cutting it (more below).
Definitely sub-optimal.

Now, I've actually hacked commercial MVCC engines back in the day, and am
comfortable playing around in database internals. I have an "itch to
scratch" for improving the scalability of Really Large Tables by explicitly
allowing control of table layouts as an optional property of the tables. I
would like some feedback as to whether this is practical given the current
architecture; it appears that it is, but I'm not intimately familiar with it
(yet).

1.) B-tree organized tables. The primary key index also contains the entire
row. Upside: Very fast retrieval for indexed queries over large data sets
and very efficient buffer usage. Downside: In practice, the primary key
index can be the only index on the table, and the locking is more like an
index than data page. I've actually used this feature a lot for very large
tables in Oracle implementations, and you can get substantial speed
improvements versus normal indexed tables if used correctly on large tables.
Vastly more convenient for a 24x7 operation than using the CLUSTER command,
which creates nasty contention, doesn't automatically order the table on
insert/update (i.e. it has to be run regularly), and basically ends up with
an index that is redundant.

2.) Function/Hash organized tables (aka Partitioning). Partitioning tuples
across blocks according to a user provided function is the second target on
my short list. Doing this well (i.e. being able to take individual
partitions offline or setting them read-only) takes a hell of a lot of work,
but having a table that internally clustered its records according to a user
provided hash function is a good start. This isn't a beginner feature
(stupidly designed transactions can be quite costly on partitioned tables),
but it is a very nice tool to have. True partitioning would be
enterprise-grade gold, but that target may be a bit high for now.

Both of these things really are attempts to address the same basic problem,
which is optimizing the number of buffers a given query uses by making the
tables layout reflect typical queries. Given the size of our typical
working sets, optimizing layout greatly improves performance by reducing
buffer thrashing.

I would very much like to work on improving this type of feature (automatic
table layout optimization), but I thought I would check with the people
currently hacking on the system first, to see if there was a showstopper or
if someone is already working on this.

Cheers,

-James Rogers
jamesr(at)best(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seun Osewa 2003-10-01 07:57:20 Re: Possible Commit Syntax Change for Improved TPS
Previous Message Joshua D. Drake 2003-10-01 04:54:07 Re: Wednesday beta postponed till Thursday