Re: NOT LIKE much faster than LIKE?

From: Andrea Arcangeli <andrea(at)cpushare(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: NOT LIKE much faster than LIKE?
Date: 2006-01-10 02:23:03
Message-ID: 20060110022303.GA20168@opteron.random
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jan 09, 2006 at 09:04:48PM -0500, Tom Lane wrote:
> Andrea Arcangeli <andrea(at)cpushare(dot)com> writes:
> > It just makes no sense to me that the planner takes a difference
> > decision based on a "not".
>
> Why in the world would you think that? In general a NOT will change the
> selectivity of the WHERE condition tremendously. If the planner weren't
> sensitive to that, *that* would be a bug. The only case where it's
> irrelevant is if the selectivity of the base condition is exactly 50%,
> which is not a very reasonable default guess for LIKE.

How do you know that "LIKE" will have a selectivity above 50% in the
first place? I think 50% should be the default unless the selectively is
measured at runtime against the data being queried.

If you don't know the data, I think it's a bug that LIKE is assumed to
have a selectivity above 50%. You can't know that, only the author of
the code can know that and that's why I talked about hints. It'd be fine
to give hints like:

UNLIKELY string LIKE '%% PREEMPT %%'

or:

LIKELY string NOT LIKE '%% PREEMPT %%'

Then you could assume that very little data will be returned or a lot of
data will be returned.

If you don't get hints NOT LIKE or LIKE should be assumed to have the
same selectivity.

> It sounds to me that the problem is misestimation of the selectivity
> of the LIKE pattern --- the planner is going to think that
> LIKE '%% PREEMPT %%' is fairly selective because of the rather long
> match text, when in reality it's probably not so selective on your
> data. But we don't keep any statistics that would allow the actual

True, there's a lot of data that matches %% PREEMPT %% (even if less
than the NOT case).

> number of matching rows to be estimated well. You might want to think
> about changing your data representation so that the pattern-match can be
> replaced by a boolean column, or some such, so that the existing
> statistics code can make a more reasonable estimate how many rows are
> selected.

I see. I can certainly fix it by stopping using LIKE. But IMHO this
remains a bug, since until the statistics about the numberof matching
rows isn't estimated well, you should not make assumptions on LIKE/NOT
LIKE. I think you can change the code in a way that I won't have to
touch anything, and this will lead to fewer surprises in the future IMHO.

Thanks!

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2006-01-10 02:29:05 Re: NOT LIKE much faster than LIKE?
Previous Message Tom Lane 2006-01-10 02:04:48 Re: NOT LIKE much faster than LIKE?