From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Gordon A(dot) Runkle" <gar(at)integrated-dynamics(dot)com> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Odd statistics behaviour in 7.2 |
Date: | 2002-02-13 21:34:27 |
Message-ID: | 782.1013636067@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Gordon A. Runkle" <gar(at)integrated-dynamics(dot)com> writes:
> Would it be fair to say that the correct workaround for now would
> be to use ALTER TABLE SET STATISTICS on columns of interest which have
> this near-unique characteristic?
Yeah, that's probably the best we can do until we can think of a better
estimation equation.
> Does ALTER TABLE SET STATISTICS only increase the histogram size, or
> does it also cause more rows to be sampled?
Both. The Chaudhuri paper I referred to has some math purporting to
prove that the required sample size is directly proportional to the
histogram size, for fixed relative error in the histogram boundaries.
So I made the same parameter control both.
Actually the sample size is driven by the largest SET STATISTICS value
for any column of the table. So you can pick which one you think a
larger histogram would be most useful for; it doesn't have to be the
same column that's got the bad-number-of-distinct-values problem.
Which columns, if any, do you do range queries on? Those would be the
ones where a bigger histogram would be useful.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-02-13 21:53:48 | Re: NAMEDATALEN Changes |
Previous Message | Gordon A. Runkle | 2002-02-13 21:10:04 | Re: Odd statistics behaviour in 7.2 |