From: | Hannu Krosing <hannu(at)sid(dot)tm(dot)ee> |
---|---|
To: | Justin Clift <justin(at)postgresql(dot)org> |
Cc: | mlw <markw(at)mohawksoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, matthew(at)zeut(dot)net, Alex Pilosov <alex(at)pilosoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Super Optimizing Postgres |
Date: | 2001-11-18 06:39:11 |
Message-ID: | 3BF7578F.7000108@sid.tm.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Justin Clift wrote:
>>In my example, two computers with exactly the same hardware, except one has a
>>5400 RPM IDE drive, the other has a 10,000 RPM IDE drive. These machines should
>>not use the same settings, it is obvious that a sequential scan block read on
>>one will be faster than the other.
>>
>
>If we're going to do this bit properly, then we'll have to take into
>consideration many database objects will need their own individual
>statistics. For example, lets say we have a database with a bunch of
>10k rpm SCSI drives which the tables are on, and the system also has one
>or more 15k rpm SCSI drives (lets say a Seagate Cheetah II drives) on
>which the indices have been placed. With the 10k rpm drives, the tables
>needing the fastest throughput or having the highest usage are put on
>the outer edges of the disk media, and the rest of the tables are placed
>in the available space.
>
>On this theoretical system, we will be better off measuring the
>performance of each table and index in turn then generating and storing
>costs for each one which are as "accurate as possible at this point in
>time".
>
That would mean that these statistic values must be stored in pg_class
and not be SET
variables at all.
This will probably have the added benefit that some cacheing effects of
small/big tables
will be accounted for automatically so you dont have to do that in
optimizer.
> A model like this would probably have these costs re-calculated
>each time the ANALYZE command is run to ensure their accuracy through
>database growth and changes.
>
Then the ANALYZE should be run on both tables and indexes. AFAIK we
currently
analyze only real data.
>
>I think this would be decently accurate, and RAID systems would be
>accurately analysed. Don't know how to take into account large cache
>sizes though. :)
>
Maybe some volatile statistict on how much of table "may be" cached in
the disk/fs
caches, assuming that we currently know how much of each is in shared
memory.
-----------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2001-11-18 06:40:37 | Re: OCTET_LENGTH is wrong |
Previous Message | Hannu Krosing | 2001-11-18 06:22:09 | Re: OCTET_LENGTH is wrong |