Quick Links

Re: best statistic target for boolean columns

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	PostgreSQL Performance List <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: best statistic target for boolean columns
Date:	2004-09-27 19:26:32
Message-ID:	3481.1096313192@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> No, actually the stats table keeps the n most common values and their
> frequency (usually in percentage). So really a target of 2 ought to be enough
> for boolean values. In fact that's all I see in pg_statistic; I'm assuming
> there's a full histogram somewhere but I don't see it. Where would it be?

It's not going to be there. The histogram only covers values that are
not in the most-frequent-values list, and therefore it won't exist for a
column that is completely describable by most-frequent-values.

> However the target also dictates how large a sample of the table to take. A
> target of two represents a very small sample. So the estimations could be
> quite far off.

Right. The real point of stats target for such columns is that it
determines how many rows to sample, and thereby indirectly implies
the accuracy of the statistics. For a heavily skewed boolean column
you'd want a high target so that the number of occurrences of the
infrequent value would be estimated accurately.

It's also worth noting that the number of rows sampled is driven by the
largest per-column stats target in the table, and so reducing stats
target to 2 for a boolean column will save *zero* effort unless all the
columns in the table are booleans.

regards, tom lane

In response to

Re: best statistic target for boolean columns at 2004-09-27 19:13:45 from Gregory Stark

Responses

Re: best statistic target for boolean columns at 2004-09-27 22:42:06 from Gaetano Mendola

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Aaron Werman	2004-09-27 20:11:53	Re: Caching of Queries
Previous Message	Jim C. Nasby	2004-09-27 19:18:36	Re: Caching of Queries