From: | hamann(dot)w(at)t-online(dot)de |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: looking for a faster way to do that |
Date: | 2011-09-23 07:45:51 |
Message-ID: | 4E7C392F.mailGNP11KAXQ@amadeus3.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alban Hertroys <haramrae(at)gmail(dot)com> wrote:
>> What is the output of explain?
>>
>> You say 'the other table', so presumably we're dealing with a foreign key
>> here. Is there an index on that column?
Albe Laurenz wrote:
>> Is the index used for "where code ~ '^ABC3563'"?
>>
>> If not, then the result is fast only because the table is scanned only once,
>> and it's just the factor of 3000 that's killing you.
>>
>> The second query (where code ~ wantcode) can never use an index because
>> the pattern "wantcode" is unknown at query planning time.
>>
>> Yours,
>> Laurenz Albe
Here I created a subset (just number and code matching a certain prefix)
\d items
Table "pg_temp_1.items"
Column | Type | Modifiers
--------+-----------------------+-----------
num | integer |
code | character varying(40) |
create index itemsc on items (code);
select count(*) from items;
count
-------
9614
A single anchored query
select * from items where code ~ '^ABC';
does indeed use the index to retrieve data.
Next I copied a file of wanted codes
create temp table n (wantcode text);
\copy n from /tmp/rmartin.tmp
the file contains plain names, i.e. unanchored matches
explain analyze select num, n.wantcode from items, n where items.code ~ n.wantcode;
Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual time=148.479..336280.488 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.048..38.666 rows=9614 loops=1)
-> Materialize (cost=20.00..30.00 rows=1000 width=32) (actual time=0.001..1.049 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual time=0.003..1.839 rows=815 loops=1)
Total runtime: 336286.692 ms
An exact match "where items.code = n.wantcode" on the same data completes in 40 ms
BTW: indexing the second table does not affect the query plan or the runtime, it just shows
actual row count rather than estimate.
This is, of course, bad; an anchored match could be faster and also is more appropriate
to the scenario. So I change the contents of the second table
update n set wantcode = textcat('^', wantcode);
and try again, with similar results
Nested Loop (cost=14.15..176478.01 rows=39178 width=36) (actual time=125.114..308831.697 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.061..2034.572 rows=9614 loops=1)
-> Materialize (cost=14.15..22.30 rows=815 width=32) (actual time=0.001..1.095 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual time=0.114..1.893 rows=815 loops=1)
Total runtime: 308837.746 ms
I am aware that this is unlikely to work fast (the planner would perhaps need a hint in the query
rather than in the data column to choose an anchored match algorithm (in case there is
such an algo, of course)
So I wonder whether there might be a different approach to this problem rather than
pattern matching.
I recall I had a similar problem before with a "contacts" column possibly containing one or more
email addresses. Here searches would also be number of people times number of requests
performance. I finally ended up with a @@ match (contrib/tsquery) and a supporting GIST index,
but that only supports exact match, not prefix
Regards
Wolfgang Hamann
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Rose | 2011-09-23 09:47:15 | Relative performance of prefix and suffix string matching |
Previous Message | Mike Christensen | 2011-09-23 07:03:20 | Re: Materialized views in Oracle |