Quick Links

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Gurmeet Manku <manku(at)cs(dot)stanford(dot)edu>
Cc:	<pgsql-hackers(at)postgresql(dot)org>,
Subject:	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date:	2005-04-26 23:03:07
Message-ID:	87u0lt5efo.fsf@stark.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

This one looks *really* good.

http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

It does require a single full table scan but it works in O(n) time and
constant space and it guarantees the confidence intervals for the estimates it
provides like the histograms do for regular range scans.

It can even keep enough data to provide estimates for n_distinct when
unrelated predicates are applied. I'm not sure Postgres would want to do this
though; this seems like it's part of the cross-column correlation story more
than the n_distinct story. It seems to require keeping an entire copy of the
sampled record in the stats tables which would be prohibitive quickly in wide
tables (it would be O(n^2) storage in the number of columns) .

It also seems like a lot of work to implement. Nothing particular that would
be impossible, but it does require storing a moderately complex data
structure. Perhaps Postgres's new support for data structures will make this
easier.

--
greg

In response to

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-26 22:00:48 from Gurmeet Manku

Responses

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? at 2005-04-26 23:10:11 from Rod Taylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rod Taylor	2005-04-26 23:10:11	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Previous Message	Dave Held	2005-04-26 22:43:14	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Rod Taylor	2005-04-26 23:10:11	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Previous Message	Dave Held	2005-04-26 22:43:14	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?