From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: limit in subquery causes poor selectivity estimation |
Date: | 2011-09-06 02:25:03 |
Message-ID: | CA+Tgmoagrs=FQmdr04tiYt2-Kwu6MyFccx5cmcdR_kHL7fky5g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 2, 2011 at 12:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> column values). But GROUP BY or DISTINCT would entirely invalidate the
> column frequency statistics, which makes me think that ignoring the
> pg_statistic entry might be the thing to do. Comments?
There's a possible problem there in that you may have trouble getting
a good join selectivity estimate in cases like:
SELECT ... FROM foo LEFT JOIN (SELECT x, SUM(1) FROM bar GROUP BY 1)
ON foo.x = bar.x
My guess is that in practice, the number of rows in foo that find a
join partner here is going to be much higher than what a stats-less
join selectivity estimation is likely to come up with. You typically
don't write a query like this in the first place if you don't expect
to find matches, although I'm sure it's been done. In some cases you
might even have a foreign key relationship to work with.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-09-06 02:27:16 | Re: [v9.1] sepgsql - userspace access vector cache |
Previous Message | Bruce Momjian | 2011-09-06 02:18:24 | Re: tolower() identifier downcasing versus multibyte encodings |