From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Incorrect assumptions with low LIMITs |
Date: | 2012-03-17 12:48:20 |
Message-ID: | CA+U5nML34qcciWUZdMRJjTs_=iZGtKpQUxeo_X4Y5YW=r4+N5w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 17, 2012 at 11:33 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Sat, Mar 17, 2012 at 9:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> My wish was to register this as both a common and significant bug,
>
> It has definitely come up here before many times.
>
> However at root the problem is part of the general class of not
> understanding how two different columns are related. Postgres is
> assuming they're entirely independent and therefore all the values of
> x are uniformly distributed over values of y.
>
> To plan this better in your case it would have to know that blah_id <=
> 72572020 is not equally likely for user_id = ANY
> ('{....list....}'::integer[]) as it is for the table as a whole.
That is not the root cause in this case, though I agree that is also a
problem. The first plan is more complex because of an ORDER BY that
favours the index scan, but that's actually irrelevant to the case;
the use of blah_id is actually the user using the fact that things are
allocated in date order to avoid doing a date sort.
The problems are as I described them
(1) no account made for sparsity, and other factors leading to an
overestimate of rows (N)
(2) inappropriate assumption of the effect of LIMIT m, which causes a
costly SeqScan to appear better than an IndexScan for low m/N, when in
fact that is seldom the case.
Overestimating N in (1) inverts the problem, so that an overestimate
isn't the safe thing at all.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2012-03-17 13:46:14 | github notification spam |
Previous Message | Greg Stark | 2012-03-17 11:33:45 | Re: Incorrect assumptions with low LIMITs |