| From: | Andrea Arcangeli <andrea(at)cpushare(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Re: NOT LIKE much faster than LIKE? | 
| Date: | 2006-01-10 02:23:03 | 
| Message-ID: | 20060110022303.GA20168@opteron.random | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On Mon, Jan 09, 2006 at 09:04:48PM -0500, Tom Lane wrote:
> Andrea Arcangeli <andrea(at)cpushare(dot)com> writes:
> > It just makes no sense to me that the planner takes a difference
> > decision based on a "not".
> 
> Why in the world would you think that?  In general a NOT will change the
> selectivity of the WHERE condition tremendously.  If the planner weren't
> sensitive to that, *that* would be a bug.  The only case where it's
> irrelevant is if the selectivity of the base condition is exactly 50%,
> which is not a very reasonable default guess for LIKE.
How do you know that "LIKE" will have a selectivity above 50% in the
first place? I think 50% should be the default unless the selectively is
measured at runtime against the data being queried.
If you don't know the data, I think it's a bug that LIKE is assumed to
have a selectivity above 50%. You can't know that, only the author of
the code can know that and that's why I talked about hints. It'd be fine
to give hints like:
UNLIKELY string LIKE '%% PREEMPT %%'
or:
LIKELY string NOT LIKE '%% PREEMPT %%'
Then you could assume that very little data will be returned or a lot of
data will be returned. 
If you don't get hints NOT LIKE or LIKE should be assumed to have the
same selectivity.
> It sounds to me that the problem is misestimation of the selectivity
> of the LIKE pattern --- the planner is going to think that
> LIKE '%% PREEMPT %%' is fairly selective because of the rather long
> match text, when in reality it's probably not so selective on your
> data.  But we don't keep any statistics that would allow the actual
True, there's a lot of data that matches %% PREEMPT %% (even if less
than the NOT case).
> number of matching rows to be estimated well.  You might want to think
> about changing your data representation so that the pattern-match can be
> replaced by a boolean column, or some such, so that the existing
> statistics code can make a more reasonable estimate how many rows are
> selected.
I see. I can certainly fix it by stopping using LIKE. But IMHO this
remains a bug, since until the statistics about the numberof matching
rows isn't estimated well, you should not make assumptions on LIKE/NOT
LIKE. I think you can change the code in a way that I won't have to
touch anything, and this will lead to fewer surprises in the future IMHO.
Thanks!
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Christopher Kings-Lynne | 2006-01-10 02:29:05 | Re: NOT LIKE much faster than LIKE? | 
| Previous Message | Tom Lane | 2006-01-10 02:04:48 | Re: NOT LIKE much faster than LIKE? |