Re: Query with "like" is really slow

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query with "like" is really slow
Date: 2007-09-07 12:08:59
Message-ID: 87642m8yp0.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Christian Schröder <cs(at)deriva(dot)de> writes:

...
> -> Seq Scan on table2 (cost=0.00..186.64 rows=2 width=4) (actual time=0.052..2.259 rows=42 loops=1)
> Filter: (c ~~ '1131%'::text)
...
> -> Seq Scan on table2 (cost=0.00..200.89 rows=14 width=4) (actual time=0.084..3.419 rows=42 loops=1)
> Filter: ("substring"((c)::text, 1, 4) = 1131'::text)
...

> My question is: Why do I have to optimize my query (use "substring" instead
> of "like") instead of having the database do this for me? Or is there a
> difference between both queries which I cannot see?

The only difference is that the optimizer understands LIKE better than it does
substring and so it tries harder to come up with a good estimate of how many
rows will match. In this case it seems its estimate is actually better (by
pure luck) with the substring() call. But it's still not very good.

Have these tables been analyzed recently? If so try raising the statistics
target on the "c" column. If the number of rows estimated goes up from 2 to
the 14 it's estimating with substring() then you'll get the better plan.
Hopefully it would be even better than that though.

> What can I do to get better results when using "like"? Do I have to add
> indices?

You could add an index on "c" (using the varchar_pattern_ops opclass if you're
not in C locale) but I don't know how much it would help. Most of the time is
being spent in the index lookups as you suspected below. So speeding up this
side of the join might not help much. On the other hand it might let the
optimizer do a sequential scan on the other side and use index lookups on this
side instead.

Note that the optimizer is unable to use an index for the substring() approach
so i really think you're better off getting the estimates better for LIKE.

> And last question: I do not really understand the first query plan. The actual
> time for the outer nested loop is 532673.631 ms. As far as I have understood
> the docs this includes the child nodes. But I cannot find the time-consuming
> child node. I only see two child nodes: The inner nested loop (which took
> 31.692 ms) and the index scan (which took 243.643 ms). Or do I have to multiply
> the 243.643 ms with 1627 (number of loops)? But even then I get 396407.161 ms,
> which is still far away from the 532673.631 ms in the parent node.

The nested loop still has to do some work. Actually it's quite possible that
that extra overhead in nested loop is largely gettimeofday() calls for the
explain analyze. Does the query take less time run without explain analyze
than it does run with it?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Max Zorloff 2007-09-07 12:10:31 Re: Connection pooling
Previous Message Ron Johnson 2007-09-07 12:07:44 Re: Column as arrays.. more efficient than columns?