From: | marcin mank <marcin(dot)mank(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: cross column correlation revisted |
Date: | 2010-07-14 21:57:56 |
Message-ID: | AANLkTimWL5LeO4Iioj-i4HkBZgIWtX6F588n7KdFQoem@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 2010/7/14 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> If the combination of columns is actually interesting, there might well
>> be an index in place, or the DBA might be willing to create it.
>
> Indexes aren't free, though, nor even close to it.
>
> Still, I think we should figure out the underlying mechanism first and
> then design the interface afterwards. One idea I had was a way to say
> "compute the MCVs and histogram buckets for this table WHERE
> <predicate>". If you can prove predicate for a particular query, you
> can use the more refined statistics in place of the full-table
> statistics. This is fine for the breast cancer case, but not so
> useful for the zip code/street name case (which seems to be the really
> tough one).
>
One way of dealing with the zipcode problem is estimating NDST =
count(distinct row(zipcode, street)) - i.e. multi-column ndistinct.
Then the planner doesn`t have to assume that the selectivity of a
equality condition involving both zipcode and city is a multiple of
the respective selectivities. As a first cut it can assume that it
will get count(*) / NDST rows, but there are ways to improve it.
Greetings
Marcin Mańk
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2010-07-14 23:33:54 | Re: cross column correlation revisted |
Previous Message | Peter Eisentraut | 2010-07-14 21:11:44 | Re: Per-column collation, proof of concept |