Re: NOT LIKE much faster than LIKE?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Andrea Arcangeli <andrea(at)cpushare(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: NOT LIKE much faster than LIKE?
Date: 2006-01-10 23:36:45
Message-ID: 1136936205.21025.507.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I think its OK to use the MCV, but I have a problem with the current
> > heuristics: they only work for randomly generated strings, since the
> > selectivity goes down geometrically with length.
>
> We could certainly use a less aggressive curve for that. You got a
> specific proposal?

I read some research not too long ago that showed a frequency curve of
words by syllable length. I'll dig that out tomorrow.

> > I would favour the idea of dynamic sampling using a block sampling
> > approach; that was a natural extension of improving ANALYZE also.
>
> One thing at a time please. Obtaining better statistics is one issue,
> but the one at hand here is what to do given particular statistics.

I meant use the same sampling approach as I was proposing for ANALYZE,
but do this at plan time for the query. That way we can apply the
function directly to the sampled rows and estimate selectivity.

I specifically didn't mention that in the Ndistinct discussion because I
didn't want to confuse the subject further, but the underlying block
sampling method would be identical, so the code is already almost
there...we just need to eval the RestrictInfo against the sampled
tuples.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Lewis 2006-01-11 00:28:06 Re: help tuning queries on large database
Previous Message Tom Lane 2006-01-10 22:21:25 Re: NOT LIKE much faster than LIKE?