Quick Links

Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

From:	Zeugswetter Andreas OSB sIT <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To:	Nathan Boley <npboley(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date:	2008-06-10 09:01:02
Message-ID:	6DAFE8F5425AB84DB3FCA4537D829A561BBF8B9246@M0164.s-mxs.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Obviously we run into problems when
> a) we have a poor estimate for ndistinct - but then we have
> worse problems
> b) our length measure doesn't correspond well with ndistinct
> in an interval

One more problem with low ndistinct values is that the condition might very well
hit no rows at all. But Idea 1 will largely overestimate the number of hits.

e.g. char(2) field has a histogram bin for 'a1' - 'b1' ndistinct is 2 because actual
values in the bin are 'a1' and 'a2'. A query for 'a3' now has a bogus estimate of nrowsperbin / 2.

I think for low ndistinct values we will want to know the exact
value + counts and not a bin. So I think we will want additional stats rows
that represent "value 'a1' stats".

Andreas

In response to

Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics at 2008-06-09 17:51:07 from Nathan Boley

Responses

Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics at 2008-06-10 10:13:55 from Gregory Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Cave-Ayland	2008-06-10 09:16:51	Re: Strange issue with GiST index scan taking far too long
Previous Message	Mark Cave-Ayland	2008-06-10 08:57:18	Re: Strange issue with GiST index scan taking far too long