Quick Links

Indexed leading substring searches - worked, now doesn't

From:	Wes <wespvp(at)syntegra(dot)com>
To:	Postgresql-General <pgsql-general(at)postgresql(dot)org>
Subject:	Indexed leading substring searches - worked, now doesn't
Date:	2005-02-03 19:20:00
Message-ID:	BE27D580.5E34%wespvp@syntegra.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I know my leading substring searches used to be done via indexes. We
specifically tested that. Since the last time I tested it, the database has
probably been reloaded to fix a corruption problem. Now all I can get is
sequential leading substring searches. In the examples below, the database
was vacuumed last night.

The database is very large (currently about 100 GB, and will be 100's of
gigabytes), so performance is important. This particular table in the
example has only about 2.8 million rows.

PostgreSQL version is 7.4.5. Prior to the reload, it was 7.4.1.

The locale is showing up as en_US.iso885915. As far as I know, it was
always this (default RedHat install), so I don't understand why it worked
before. Did something change between 7.4.1 and 7.4.5? I supposed it's
possible that I specified locale=C on the original database and don't
remember that...

I'm not going to have to "initdb --locale=C" and am I? I looked at index
classes, and that doesn't appear to be something I want to do, due to
performance. What kind of performance hit do you actually take by using an
index class?

Wes

Pg_controldata shows:

Maximum length of locale name: 128
LC_COLLATE: en_US.iso885915
LC_CTYPE: en_US.iso885915

narc=> select count(*) from addresses;
count
---------
2829640
(1 row)

narc=> explain select * from addresses where address = 'blah';
QUERY PLAN
----------------------------------------------------------------------------
----------
Index Scan using addresses_i_address on addresses (cost=0.00..2.81 rows=1
width=40)
Index Cond: ((address)::text = 'blah'::text)
(2 rows)

narc=> explain select * from addresses where address like 'blah%';
QUERY PLAN
--------------------------------------------------------------
Seq Scan on addresses (cost=0.00..61244.68 rows=2 width=40)
Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)

narc=> explain analyze select * from addresses where address like 'blah%';
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
Seq Scan on addresses (cost=0.00..61244.68 rows=2 width=40) (actual
time=1445.386..8913.435 rows=6 loops=1)
Filter: ((address)::text ~~ 'blah%'::text)
Total runtime: 8913.504 ms
(3 rows)

Something else doesn't make sense.. I did the same query a few minutes ago
and the cost was totally different:

narc=> explain select * from addresses where address like 'blah%';
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on addresses (cost=100000000.00..100061244.67 rows=2 width=40)
Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)

Responses

Re: Indexed leading substring searches - worked, now doesn't at 2005-02-03 20:29:53 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Karl Denninger	2005-02-03 19:25:23	Re: Upgrade from 7.4 -> 8.0.1 - problem with dump/restore
Previous Message	Michael Fuhr	2005-02-03 19:15:11	Re: Upgrade from 7.4 -> 8.0.1 - problem with dump/restore