From: | Stuart Woolford <stuartw(at)newmail(dot)net> |
---|---|
To: | "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] indexed regex select optimisation missing? |
Date: | 1999-11-04 23:09:19 |
Message-ID: | 99110512143000.01188@test.macmillan.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Fri, 05 Nov 1999, you wrote:
> Ah, your description just tripped a memory for me from the hackers list:
>
> The behavior you describe has to do with the implementation of using an
> index for regex matching, in the presence of the USE_LOCALE configuration
> option.
>
> Internally, the condition: WHERE word~'^alongword' is converted in the
> parser(!) to:
>
> WHERE word >= 'alongword' AND word < 'alongword\377'
>
> since the index needs inequalities to be used, not matches. Now, the
> problem is the hack of tacking an octal \377 on the string to create
> the lexagraphically 'just bigger' value assumes ASCI sort order. If
> USE_LOCALE is defined, this is dropped, since we don't have a good fix
> yet, and slow correct behavior is better than fast, incorrect behavior.
ah, now this makes sense, I'm using the RPMs, and I bet they have lexical
enabled by default (damb! perhaps another set should be produced without this
option? it makes a BIG difference)
> > So, you have two options: if you don't need locale support,
recompile > without it. Otherwise, hand code your anchored matches as the pair
of > conditionals above Hmm, is there syntax for adding an arbitrary value to
> a string constant in the SQL? I suppose you could use: word < 'alongwore',
> i.e. hand increment the last character, so it's larger than any match.
I've tried a test using ">='window' and <'windox'", and it works perfectly, and
very very fast, so I think we have found your culprit.
>
> Your point is correct, the developers are aware of it as a theoretical
> problem, at least. Always helps to hear a real world case, though. I
> believe it's on the TODO list as is, otherwise, pester Bruce. ;-)
>
> Reviewing my email logs from June, most of the work on this has to do with
> people who needs locales, and potentially multibyte character sets. Tom
> Lane is of the opinion that this particular optimization needs to be moved
> out of the parser, and deeper into the planner or optimizer/rewriter,
> so a good fix may be some ways out.
Hmm, perhaps a 'good' initial fix would be to produce another set of RPMs,
and/or add it to the FAQ in the 4.x section about the slow queries that say
indexes are used for this type of search. using the >= AND < trick does seem to
work, but is a little non-obvious (and hard to code in some situations, it will
make quite a difference to how I need to implement my searching system)
>
> Ross
thank you very very much for your assistance on this, it is greatly appreciated!
--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Stuart Woolford | 1999-11-04 23:59:03 | Re: [GENERAL] indexed regex select optimisation missing? |
Previous Message | Ross J. Reedstrom | 1999-11-04 22:06:21 | Re: [GENERAL] indexed regex select optimisation missing? |
From | Date | Subject | |
---|---|---|---|
Next Message | Don Baccus | 1999-11-04 23:34:35 | Re: [HACKERS] New version of psql |
Previous Message | Bruce Momjian | 1999-11-04 23:05:27 | Re: [HACKERS] New version of psql |