Quick Links

Estimation problem with a LIKE clause containing a /

From:	"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Estimation problem with a LIKE clause containing a /
Date:	2007-11-07 12:53:16
Message-ID:	1d4e0c10711070453q214f089cpd144cbb5193d3c4f@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hi all,

While studying a query taking forever after an ANALYZE on a never
analyzed database (a bad estimate causes a nested loop on a lot of
tuples), I found the following problem:
- without any stats (I removed the line from pg_statistic):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=114 width=587) (actual
time=0.008..21.692 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.097 ms
-> the estimate is bad (it's expected) but it's sufficient to prevent
the nested loop so it's my current workaround

- after analyzing the cms_items table (statistics is set to 10 but
it's exactly the same for 100):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*1* width=103) (actual
time=0.010..22.024 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.341 ms
-> this estimate leads PostgreSQL to choose a nested loop which is
executed more than 11k times and causes the query to take forever.

- if I remove the / from the LIKE clause (which I can't as ancestors
is more or less a path):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*9097* width=103)
(actual time=0.043..25.251 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062%'::text)
Total runtime: 34.778 ms

Which is a really good estimate.

Is it something expected?

The histogram does contain values beginning with '1062/' (5 out of 10)
and the cms_items table has ~ 22k rows.

Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, compiled by GCC
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3). I checked the release
notes between 8.1.8 and 8.1.10 and I didn't find anything relevant to
fix this problem.

Thanks for any help.

Regards,

--
Guillaume

Responses

Re: Estimation problem with a LIKE clause containing a / at 2007-11-07 13:25:40 from Alexander Staubo
Re: Estimation problem with a LIKE clause containing a / at 2007-11-07 16:34:52 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2007-11-07 13:24:14	Re: pg_resetxlog output clarification
Previous Message	Stefan Kaltenbrunner	2007-11-07 12:42:14	Re: tribble.postgresql.org - planned maintenance downtime

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alexander Staubo	2007-11-07 13:25:40	Re: Estimation problem with a LIKE clause containing a /
Previous Message	Jens-Wolfhard Schicke	2007-11-06 20:48:12	Subpar Execution Plan