Re: LIKE search and performance

From: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>
To: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alexander Staubo <alex(at)purefiction(dot)net>, Andy <frum(at)ar-sd(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE search and performance
Date: 2007-05-24 21:02:40
Message-ID: 1180040560.31471.262.camel@archimedes
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2007-05-24 at 21:54 +0100, James Mansion wrote:
> > If Sybase is still like SQL Server (or the other way around), it *may*
> > end up scanning the index *IFF* the index is a clustered index. If it's
> > a normal index, it will do a sequential scan on the table.
> >
> >
> Are you sure its not covered? Have to check at work - but I'm off next
> week so it'll have to wait.
>
> > It's not a win on PostgreSQL, because of our MVCC implementation. We
> > need to scan *both* index *and* data pages if we go down that route, in
> > which case it's a lot faster to just scan the data pages alone.
> >
> >
> Why do you need to go to all the data pages - doesn't the index
> structure contain all the keys so
> you prefilter and then check to see if the *matched* items are still in
> view? I'll be first to admit I
> know zip about Postgres, but it seems odd - doesn't the index contain
> copies of the key values?.
>
> I suspect that I mis-spoke with 'leaf'. I really just mean 'all index
> pages with data', since the scan
> does not even need to be in index order, just a good way to get at the
> data in a compact way.

PG could scan the index looking for matches first and only load the
actual rows if it found a match, but that could only be a possible win
if there were very few matches, because the difference in cost between a
full index scan and a sequential scan would need to be greater than the
cost of randomly fetching all of the matching data rows from the table
to look up the visibility information.

So yes it would be possible, but the odds of it being faster than a
sequential scan are small enough to make it not very useful.

And since it's basically impossible to know the selectivity of this kind
of where condition, I doubt the planner would ever realistically want to
choose that plan anyway because of its poor worst-case behavior.

-- Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2007-05-24 21:23:48 Re: LIKE search and performance
Previous Message James Mansion 2007-05-24 20:54:34 Re: LIKE search and performance