From: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
---|---|
To: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Estimation problem with a LIKE clause containing a / |
Date: | 2007-11-07 12:53:16 |
Message-ID: | 1d4e0c10711070453q214f089cpd144cbb5193d3c4f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hi all,
While studying a query taking forever after an ANALYZE on a never
analyzed database (a bad estimate causes a nested loop on a lot of
tuples), I found the following problem:
- without any stats (I removed the line from pg_statistic):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=114 width=587) (actual
time=0.008..21.692 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.097 ms
-> the estimate is bad (it's expected) but it's sufficient to prevent
the nested loop so it's my current workaround
- after analyzing the cms_items table (statistics is set to 10 but
it's exactly the same for 100):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*1* width=103) (actual
time=0.010..22.024 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.341 ms
-> this estimate leads PostgreSQL to choose a nested loop which is
executed more than 11k times and causes the query to take forever.
- if I remove the / from the LIKE clause (which I can't as ancestors
is more or less a path):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*9097* width=103)
(actual time=0.043..25.251 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062%'::text)
Total runtime: 34.778 ms
Which is a really good estimate.
Is it something expected?
The histogram does contain values beginning with '1062/' (5 out of 10)
and the cms_items table has ~ 22k rows.
Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, compiled by GCC
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3). I checked the release
notes between 8.1.8 and 8.1.10 and I didn't find anything relevant to
fix this problem.
Thanks for any help.
Regards,
--
Guillaume
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2007-11-07 13:24:14 | Re: pg_resetxlog output clarification |
Previous Message | Stefan Kaltenbrunner | 2007-11-07 12:42:14 | Re: tribble.postgresql.org - planned maintenance downtime |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Staubo | 2007-11-07 13:25:40 | Re: Estimation problem with a LIKE clause containing a / |
Previous Message | Jens-Wolfhard Schicke | 2007-11-06 20:48:12 | Subpar Execution Plan |