From: | hamann(dot)w(at)t-online(dot)de |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Optimizing query? |
Date: | 2013-01-31 18:29:28 |
Message-ID: | wolfgang-1130131192928.A037318@amadeus3.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Pavel Stehlule wrote:
>> >> Hi,
>> >>
>> >> I am trying to match items from 2 tables based on a common string.
>> >> One is a big table which has one column with entries like XY123, ABC44, =
>> etc
>> >> The table has an index on that column.
>> >> The second table is, typically, much smaller
>> >>
>> >> select .... from tab1, tab2 where tab1.code =3D tab2.code;
>> >>
>> >> This works fine and fast.
>> >> Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D=
>> in the
>> >> big table and want them to match XY423, GF55 in the second table
>> >>
>> >> Variants I have tried
>> >>
>> >> select .... from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])');
>> >> select .... from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z=
>> ])');
>> >>
>> >
>> > Have you tried the substring function?
>> >
>> > select .... from tab1, tab2 where substring(tab1.code from 1 for 5) =3D
>> > tab2.code
>> >
Hi Pavel, it was just by chance that a fixed size substring would match the
data at hand. It is more common to have a digit/letter (or vice versa) boundary
or a hyphen there
>> >
>> >> both take an enormous time. In the better case that I can subset (e.g. a=
>> ll candidates in table 2
>> >> share initial "AX") I get back to manageable times by adding
>> >> and tab1.code ~ '^AX'
>> >> into the recipe. Actual runtime with about a million entries in tab1 and=
>> 800 entries in tab2
>> >> is about 40 seconds.
>>
>> any join where result is related to some function result can be very
>> slow, because estimation will be out and any repeated function
>> evaluation is just expensive.
>>
I see the problem since obviously every the ~ operator with a non-constant
pattern is constantly recompiling the pattern.
I wonder whether it would be possible to invent a prefix-match operator that approaches
the performance of string equality. I noted in the past (not sure whether anything
has changed in regex matching) that a constant leading part of regex would improve
performance, i.e. use an index scan to select possible candidates.
>> You can try use a functional index.
>>
>> create index on tab2 ((substring(tab1.code from 1 for 5))
>>
What kind of trick is that - mixing two tables into a functional index?
What would the exact syntax be for that?
Regards
Wolfgang Hamann
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2013-01-31 18:49:33 | Re: Pg & Tcl - is it dying out? |
Previous Message | Adrian Klaver | 2013-01-31 18:20:16 | Re: COPY table to file missing quotation marks |