Re: Database Kernels and O_DIRECT

From: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Database Kernels and O_DIRECT
Date: 2003-10-16 14:31:45
Message-ID: 60wub5uucu.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

andrew(at)dunslane(dot)net (Andrew Dunstan) writes:
> Tom Lane wrote:
>>James Rogers <jamesr(at)best(dot)com> writes:
>>>If we suddenly wanted to optimize Postgres for performance the way
>>>Oracle does, we would be a lot more keen on the O_DIRECT approach.
>>This isn't ever going to happen, for the simple reason that we don't
>> have Oracle's manpower.
>>
> [snip - long and sensible elaboration of above statement]
>
> I have wondered (somewhat fruitlessly) for several years about the
> possibilities of special purpose lightweight file systems that could
> relax some of the assumptions and checks used in general purpose file
> systems. Such a thing might provide most of the benefits of a
> "database kernel" without imposing anything extra on the database
> application layer.
>
> Just a thought - I have no resources to make any attack on such a project.

There is an exactly relevant project for this, namely Hans Reiser's
"ReiserFS," on Linux.

http://www.namesys.com/whitepaper.html

In Version 4, they will be exporting an API that allows userspace
applications to control the use of transactional filesystem updates.

If someone were to directly build a database on top of this, one might
wind up with some sort of "ReiserSQL," which would be relatively
analagous to the "database kernel" approach.

Of course, the task would be large, and it would likely take _years_
for it to stabilize to the point of being much more than a "neat
hack."

The other neat approach that would be more relevant to PostgreSQL
would be to create a filesystem that stored data in pure blocks, with
pretty large block sizes, and low overhead for saving directory
metadata. There isn't too terribly much interest in {a,o,m}time...
--
output = reverse("ofni.smrytrebil" "@" "enworbbc")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2003-10-16 16:22:38 Re: pg_autovacuum and VACUUM FREEZE
Previous Message Tom Lane 2003-10-16 14:28:18 Re: Still a few flaws in configure's default CFLAGS