From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Matteo Beccati <php(at)beccati(dot)com>, Andrea Arcangeli <andrea(at)cpushare(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: NOT LIKE much faster than LIKE? |
Date: | 2006-01-10 22:21:25 |
Message-ID: | 6398.1136931685@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> I think its OK to use the MCV, but I have a problem with the current
> heuristics: they only work for randomly generated strings, since the
> selectivity goes down geometrically with length.
We could certainly use a less aggressive curve for that. You got a
specific proposal?
>> After finishing that work it occurred to me that we could go a step
>> further: if the MCV list accounts for a substantial fraction of the
>> population, we could assume that the MCV list is representative of the
>> whole population, and extrapolate the pattern's selectivity over the MCV
>> list to the whole population instead of using the existing heuristics at
>> all. In a situation like Andreas' example this would win big, although
>> you can certainly imagine cases where it would lose too.
> I don't think that can be inferred with any confidence, unless a large
> proportion of the MCV list were itself selected. Otherwise it might
> match only a single MCV that just happens to have a high proportion,
> then we assume all others have the same proportion.
Well, of course it can't be inferred "with confidence". Sometimes
you'll win and sometimes you'll lose. The question is, is this a
better heuristic than what we use otherwise? The current estimate
for non-anchored patterns is really pretty crummy, and even with a
less aggressive length-vs-selectivity curve it's not going to be great.
Another possibility is to merge the two estimates somehow.
> I would favour the idea of dynamic sampling using a block sampling
> approach; that was a natural extension of improving ANALYZE also.
One thing at a time please. Obtaining better statistics is one issue,
but the one at hand here is what to do given particular statistics.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2006-01-10 23:36:45 | Re: NOT LIKE much faster than LIKE? |
Previous Message | Simon Riggs | 2006-01-10 22:06:36 | Re: NOT LIKE much faster than LIKE? |