Re: Optimizer confusion?

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)hub(dot)org
Subject: Re: Optimizer confusion?
Date: 2000-08-13 12:41:19
Message-ID: 3.0.5.32.20000813224119.02203d70@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

At 13:45 12/08/00 -0400, Tom Lane wrote:
>> But this (I think) just highlights the
>> fact that the index is sorted by date, and the rows were added in date
>> order. As a result (for this table, in this query), the index scan get's a
>> much better cache-hit rate, so the actual IO cost is low.
>
>> Does that sound reasonable?
>
>Quite. The cost estimates are based on the assumption that the tuples
>visited by an indexscan are scattered randomly throughout the table.

Interestingly, while testing a truly random index on a table with 4M rows,
the index estimates are actually way too optimistic (contrary to my other
example), even for a small retrieval. I'm still playing, but I'll send some
figures soon.

>
>> Does the optimizer know if I have used clustering?
>
>The killer implementation problem here is keeping track of how much the
>table ordering has been altered since the last CLUSTER command. We have
>talked about using an assumption of "once clustered, always clustered",

This *might* be appropriate to set as an index attribute of some kind, most
particularly for time-series indexes etc (as you suggest).

>I have not yet done anything about this, mainly because I'm unwilling to
>encourage people to use CLUSTER, since it's so far from being ready for
>prime time (see TODO list). Once we've done something about table
>versioning, we can rewrite CLUSTER so that it's actually reasonable to
>use on a regular basis, and at that point it'd make sense to make the
>optimizer CLUSTER-aware.

There might be a way to side-step the issue here. I assume that the index
nodes contain a pointer to a record in a file, which has some kind of file
position. By comparing the file positions on one leaf node, and then
averaging the node cluster values, you might be able to get a pretty good
idea of the *real* clustering.

Does this sound worthwhile?

It has the advantage of working for all tables, and is presumably updated
by Vacuum.

>> I don't suppose I can get the backend to tell me how many logical IOs and
>> how much CPU it used?
>
>Yes you can. Run psql with
> PGOPTIONS="-s"
>and look in the postmaster log. There's also -tparse, -tplan,
>-texec if you'd rather see the query time broken down by stages.

Thanks for this; I see almost no file IO, but lots of paging; is this a
feature of the way Linux does file buffering?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2000-08-13 13:42:15 Re: Locking
Previous Message Ian West 2000-08-13 06:16:17 Re: Locking

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-08-13 14:02:58 Re: Optimizer confusion?
Previous Message Tom Lane 2000-08-12 17:45:46 Re: [HACKERS] Optimizer confusion?