From: | Stuart Woolford <stuartw(at)newmail(dot)net> |
---|---|
To: | pgsql-general(at)postgreSQL(dot)org, hackers(at)postgreSQL(dot)org |
Subject: | more] indexed regex select optimisations? |
Date: | 1999-11-07 23:50:41 |
Message-ID: | 99110813015600.00960@test.macmillan.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Well, I've improved my regex text searches to actually use the indexes properly
now for the basic case, but I have found another 'problem' (or feature, call it
what you will ;) - to demonstrate:
with locale turned on (the default RPMS are like this):
the following takes a LONG time to run on 1.6 million records:
-------------------------------------
explain select isbn, count from inv_word_i where
word~'^foo'
order by count
Sort (cost=35148.70 rows=353 width=16)
-> Index Scan using i3 on inv_word_i (cost=35148.70 rows=353 width=16)
-------------------------------------
the following runs instantly, and does (nearly) the same thing:
-------------------------------------
explain select isbn, count from inv_word_i where
word>='foo' and word<'fop'
order by count
Sort (cost=11716.57 rows=183852 width=16)
-> Index Scan using i3 on inv_word_i (cost=11716.57 rows=183852 width=16)
-------------------------------------
but what about the following? :
-------------------------------------
explain select isbn , sum(count) from inv_word_i where
(word>='window' and word<'windox')
or
(word>='idiot' and word<'idiou')
group by isbn
order by sum(count) desc
Sort (cost=70068.84 rows=605525 width=16)
-> Aggregate (cost=70068.84 rows=605525 width=16)
-> Group (cost=70068.84 rows=605525 width=16)
-> Sort (cost=70068.84 rows=605525 width=16)
-> Seq Scan on inv_word_i (cost=70068.84 rows=605525 width=16)
-------------------------------------
this is the fastest way I've found so far to do a multi-word search (window and
idiot as the root words in this case), you note it does NOT use the indexes,
but falls back to a linear scan?!? it takes well over 30 seconds (much much too
long)
I've tried a LOT of different combinations, and have yet to find a way of
getting the system to use the indexes correctly to do what I want, the closest
I've ffound is using a select intersect select method to find all docs
containing both word (what I really want, although the query above is a ranked
or query), but it gets slow as soon as I select more than one field for the
results (I need to line isbn in this case to another database in the final
application)
I assume there is some reason the system falls back to a linear scan in this
case? it seems two index lookups would be much much more efficient..
am I missing something again?
--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias Teege | 1999-11-08 08:15:38 | Compiling problems |
Previous Message | Peter Eisentraut | 1999-11-07 16:44:10 | Re: [GENERAL] users in Postgresql |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 1999-11-08 06:06:15 | new Psql \pset border |
Previous Message | Thomas Lockhart | 1999-11-07 23:37:45 | Re: [HACKERS] psql and 6.5.3 |