From: | Philip Warner <pjw(at)rhyme(dot)com(dot)au> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)hub(dot)org |
Subject: | Re: Optimizer confusion? |
Date: | 2000-08-13 12:41:19 |
Message-ID: | 3.0.5.32.20000813224119.02203d70@mail.rhyme.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
At 13:45 12/08/00 -0400, Tom Lane wrote:
>> But this (I think) just highlights the
>> fact that the index is sorted by date, and the rows were added in date
>> order. As a result (for this table, in this query), the index scan get's a
>> much better cache-hit rate, so the actual IO cost is low.
>
>> Does that sound reasonable?
>
>Quite. The cost estimates are based on the assumption that the tuples
>visited by an indexscan are scattered randomly throughout the table.
Interestingly, while testing a truly random index on a table with 4M rows,
the index estimates are actually way too optimistic (contrary to my other
example), even for a small retrieval. I'm still playing, but I'll send some
figures soon.
>
>> Does the optimizer know if I have used clustering?
>
>The killer implementation problem here is keeping track of how much the
>table ordering has been altered since the last CLUSTER command. We have
>talked about using an assumption of "once clustered, always clustered",
This *might* be appropriate to set as an index attribute of some kind, most
particularly for time-series indexes etc (as you suggest).
>I have not yet done anything about this, mainly because I'm unwilling to
>encourage people to use CLUSTER, since it's so far from being ready for
>prime time (see TODO list). Once we've done something about table
>versioning, we can rewrite CLUSTER so that it's actually reasonable to
>use on a regular basis, and at that point it'd make sense to make the
>optimizer CLUSTER-aware.
There might be a way to side-step the issue here. I assume that the index
nodes contain a pointer to a record in a file, which has some kind of file
position. By comparing the file positions on one leaf node, and then
averaging the node cluster values, you might be able to get a pretty good
idea of the *real* clustering.
Does this sound worthwhile?
It has the advantage of working for all tables, and is presumably updated
by Vacuum.
>> I don't suppose I can get the backend to tell me how many logical IOs and
>> how much CPU it used?
>
>Yes you can. Run psql with
> PGOPTIONS="-s"
>and look in the postmaster log. There's also -tparse, -tplan,
>-texec if you'd rather see the query time broken down by stages.
Thanks for this; I see almost no file IO, but lots of paging; is this a
feature of the way Linux does file buffering?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-08-13 13:42:15 | Re: Locking |
Previous Message | Ian West | 2000-08-13 06:16:17 | Re: Locking |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-08-13 14:02:58 | Re: Optimizer confusion? |
Previous Message | Tom Lane | 2000-08-12 17:45:46 | Re: [HACKERS] Optimizer confusion? |