From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Jasen Betts <jasen(at)xnet(dot)co(dot)nz> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: best practise/pattern for large OR / LIKE searches |
Date: | 2009-08-30 13:17:30 |
Message-ID: | 162867790908300617u794ba134k2dfa0c7ebad32c8b@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello
> regex is compiled to a finite state machine and then the datanumber
> column is scanned in a single pass (for each row)
>
>> Searches are currently taking to long and we would like to optimize
>> them, but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>
> try regex first if that's too slow you may need to write a
> dictionary function that splits datanuimber into it's components
> and use full text index/search. (this will slow down updates as they will do
> upto 20 inserts into the index)
>
> searches should then be optimally fast
>
I did some tests:
1) I fill test table
insert into test SELECT
array_to_string(array_agg(array_to_string(ARRAY(select
substring('01234567890' from (random()*10)::int + 1 for 1) from
generate_series(1,(random()*10+5)::int + i - i)),'')),',') as b from
generate_series(1,100000) g(i) group by (random()*1000)::int;
2. I tested searching of 5 or 13 values. I did tests on 8.4 and 8.1
8.1
using like 190ms(440ms*)
using regexp 115ms(259ms*)
* for 13 values - so there regexp is faster than like
on 8.4
using like 80ms(151ms)
using regexp 131ms(267ms)
so like is faster then regexp on 8.4.
fulltext test (8.4)
420ms(470ms) -- without index
14ms(26ms) -- with GiST index
1ms(2ms) -- with Gin index
some samples of test queries:
select * from test where to_tsvector('simple',a) @@
to_tsquery('simple','296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323');
select * from test where a ~
'296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323';
select * from test where a like '%296426496%' or a like '%
7707431116555%' or a like '%98173598191%' or a like '%302598%' or a
like '%53174827%' or a like '%02292064629%' or a like '%188631468777%'
or a like '%4756243248%' or a like '%920473%' or a like '%16602317%'
or a like '%76613513%' or a like '%78640%' or a like '%9176323%';
regards
Pavel Stehule
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2009-08-30 13:22:16 | Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger |
Previous Message | Stephen Cuppett | 2009-08-30 13:11:53 | Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger |