Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
Cc: Markus Schaber <schabi(at)logix-tt(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date: 2005-05-03 21:43:44
Message-ID: 200505031443.44859.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Mischa,

> Okay, although given the track record of page-based sampling for
> n-distinct, it's a bit like looking for your keys under the streetlight,
> rather than in the alley where you dropped them :-)

Bad analogy, but funny.

The issue with page-based vs. pure random sampling is that to do, for example,
10% of rows purely randomly would actually mean loading 50% of pages. With
20% of rows, you might as well scan the whole table.

Unless, of course, we use indexes for sampling, which seems like a *really
good* idea to me ....

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2005-05-03 21:46:53 Interesting article on transactional algorithms includes PostgreSQL study
Previous Message Mischa Sandberg 2005-05-03 21:33:10 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

Browse pgsql-performance by date

  From Date Subject
Next Message John A Meinel 2005-05-04 00:45:17 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Previous Message Mischa Sandberg 2005-05-03 21:33:10 Re: [PERFORM] Bad n_distinct estimation; hacks suggested?