Re: RFC: planner statistics in 7.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: RFC: planner statistics in 7.2
Date: 2001-04-20 00:48:57
Message-ID: 23581.987727737@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> At 18:37 19/04/01 -0400, Tom Lane wrote:
>> (2) Statistics should be computed on the basis of a random sample of the
>> target table, rather than a complete scan. According to the literature
>> I've looked at, sampling a few thousand tuples is sufficient to give good
>> statistics even for extremely large tables; so it should be possible to
>> run ANALYZE in a short amount of time regardless of the table size.

> This sounds great; can the same be done for clustering. ie. pick a random
> sample of index nodes, look at the record pointers and so determine how
> well clustered the table is?

My intention was to use the same tuples sampled for the data histograms
to estimate how well sorted the data is. However it's not immediately
clear that that'll give a trustworthy estimate; I'm still studying it ...

>> ALTER TABLE tab SET COLUMN col STATS COUNT n

> Sounds fine - user-selectability at the column level seems a good idea.
> Would there be any value in not making it part of a normal SQLxx statement,
> and adding an 'ALTER STATISTICS' command? eg.

> ALTER STATISTICS FOR tab[.column] COLLECT n
> ALTER STATISTICS FOR tab SAMPLE m

Is that more standard than the other syntax?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Warner 2001-04-20 01:02:54 Re: RFC: planner statistics in 7.2
Previous Message Philip Warner 2001-04-20 00:44:05 Re: RFC: planner statistics in 7.2