Re: Perfomance Tuning

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Perfomance Tuning
Date: 2003-08-14 14:57:33
Message-ID: 604r0ki8wy.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

threshar(at)torgo(dot)978(dot)org (Jeff) writes:
> On Wed, 13 Aug 2003, Christopher Browne wrote:
>> You raise a good point vis-a-vis the thought of spawning multiple
>> readers; that could conceivably be a useful approach to improve
>> performance for very large queries. If you could "stripe" the tables
>> in some manner so they could be doled out to multiple worker
>> processes, that could indeed provide some benefits. If there are
>> three workers, they might round-robin to grab successive pages from
>> the table to do their work, and then end with a merge step.
>
> The way informix does this is two fold:
> 1. it handles the raw disks, it knows where table data is

The thing is, this isn't something where there is guaranteed to be a
permanent _massive_ difference in performance between "raw" and
"cooked."

Traditionally, "handling raw disks" was a big deal because the DBMS
could then decide where to stick the data, possibly down to specifying
what sector of what track of what spindle. There are four reasons for
this to not be such a big deal anymore:

1. Disk drives lie to you. They don't necessarily provide
information that even _resembles_ their true geometry. So the
best you can get is to be sure that "this block was on drive 4,
that block was on drive 7."

2. On a big system, you're more than likely using hardware RAID,
where there's further cacheing, and where the disk array may
not be honest to the DBMS about where the drives actually are.

3. The other traditional benefit to "raw" disks was that they
allowed the DBMS to be _certain_ that data was committed in
some particular order. But 1. and 2. provide regrettable
opportunities for the DBMS' belief to be forlorn. (With the
degree to which disk drives lie about things, I have to be a
bit skeptical of some of the BSD FFS claims which at least
appear to assume that they _do_ control the disk drive...
This is NOT reason, by the way, to consider FFS to be, in
any way, "bad," but rather just that some of the guarantees
may get stolen by your disk drive...)

4. Today's filesystems _aren't_ Grandpa's UFS. We've got better
stuff than we had back in the Ultrix days.

> 2. it can "partition" tables in a number of ways: round robin,
> concatination or expression (Expression is nifty, allows you to use a
> basic "where" clause to decide where to put data. ie
> create table foo (
> a int,
> b int,
> c int ) fragment on c > 0 and c < 100 in dbspace1, c > 100 c < 200 in
> dbspace 2;
>
> that kind of thing.

I remember thinking this was rather neat when I first saw it.

The "fragment on" part was most interesting at the time, when everyone
else (including filesystem makers) were decrying fragmentation as the
ultimate evil. In effect, Informix was saying that they would
_improve_ performance through fragmentation... Sort of like the rash
claim that performance can be improved _without_ resorting to a
threading-based model...

> and yeah, I would not expect to see it for a long time.. Without
> threading it would be rather difficult to implement.. but who knows
> what the future will bring us.

The typical assumption is that threading is a magical talisman that
will bring all sorts of benefits. There have been enough cases where
PostgreSQL has demonstrated stunning improvements _without_ threading
that I am very skeptical that it is necessarily necessary.
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/sap.html
Rules of the Evil Overlord #204. "I will hire an entire squad of blind
guards. Not only is this in keeping with my status as an equal
opportunity employer, but it will come in handy when the hero becomes
invisible or douses my only light source."
<http://www.eviloverlord.com/>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jan Wieck 2003-08-14 15:02:53 Re: [SQL] EXTERNAL storage and substring on long strings
Previous Message Bertrand Petit 2003-08-14 14:53:11 Re: 7.4 beta 1 getting out of swap