From: | "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu> |
---|---|
To: | Stuart Woolford <stuartw(at)newmail(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org, Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, hackers(at)postgresql(dot)org |
Subject: | Re: [GENERAL] indexed regex select optimisation missing? |
Date: | 1999-11-05 15:36:55 |
Message-ID: | 19991105093655.A19204@wallace.ece.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Stuart -
I'm forwarding a version of your last message to the hackers list, and
to Lamar Owen, who's the keeper of the RPMs. The short verson, for those
who haven't followed this thread over on GENERAL, is that Stuart is being
bitten by the USE_LOCALE affect on the makeIndexable() function in the
parser: anchored regex searches on a large table (a glossary, I believe)
take a long time, proportional to sort position of the anchoring text:
i.e. searching for '^zoo' is quick, '^apple' is very slow.
I seems to recall the packagers here (Lamar and Oliver) asking if defining
USE_LOCALE for the general RPM or deb would cause any problems for other
users, who don't need locale info. Here's a real world example.
The discussion about this was last June, and shifted focus into the
multi-byte problem, as far as I can tell. Bruce, some version of this
is on the TODO list, right?
Ross
On Fri, Nov 05, 1999 at 12:09:19PM +1300, Stuart Woolford wrote:
> On Fri, 05 Nov 1999, you wrote:
> > Ah, your description just tripped a memory for me from the hackers list:
> >
> > The behavior you describe has to do with the implementation of using an
> > index for regex matching, in the presence of the USE_LOCALE configuration
> > option.
> >
> > Internally, the condition: WHERE word~'^alongword' is converted in the
> > parser(!) to:
> >
> > WHERE word >= 'alongword' AND word < 'alongword\377'
> >
> > since the index needs inequalities to be used, not matches. Now, the
> > problem is the hack of tacking an octal \377 on the string to create
> > the lexagraphically 'just bigger' value assumes ASCI sort order. If
> > USE_LOCALE is defined, this is dropped, since we don't have a good fix
> > yet, and slow correct behavior is better than fast, incorrect behavior.
>
> ah, now this makes sense, I'm using the RPMs, and I bet they have lexical
> enabled by default (damb! perhaps another set should be produced without this
> option? it makes a BIG difference)
>
> > > So, you have two options: if you don't need locale support,
> recompile > without it. Otherwise, hand code your anchored matches as the pair
> of > conditionals above Hmm, is there syntax for adding an arbitrary value to
> > a string constant in the SQL? I suppose you could use: word < 'alongwore',
> > i.e. hand increment the last character, so it's larger than any match.
>
> I've tried a test using ">='window' and <'windox'", and it works perfectly, and
> very very fast, so I think we have found your culprit.
>
> >
> > Your point is correct, the developers are aware of it as a theoretical
> > problem, at least. Always helps to hear a real world case, though. I
> > believe it's on the TODO list as is, otherwise, pester Bruce. ;-)
> >
> > Reviewing my email logs from June, most of the work on this has to do with
> > people who needs locales, and potentially multibyte character sets. Tom
> > Lane is of the opinion that this particular optimization needs to be moved
> > out of the parser, and deeper into the planner or optimizer/rewriter,
> > so a good fix may be some ways out.
>
> Hmm, perhaps a 'good' initial fix would be to produce another set of RPMs,
> and/or add it to the FAQ in the 4.x section about the slow queries that say
> indexes are used for this type of search. using the >= AND < trick does seem to
> work, but is a little non-obvious (and hard to code in some situations, it will
> make quite a difference to how I need to implement my searching system)
>
> >
> > Ross
>
> thank you very very much for your assistance on this, it is greatly appreciated!
>
> --
> ------------------------------------------------------------
> Stuart Woolford, stuartw(at)newmail(dot)net
> Unix Consultant.
> Software Developer.
> Supra Club of New Zealand.
> ------------------------------------------------------------
--
Ross J. Reedstrom, Ph.D., <reedstrm(at)rice(dot)edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005
From | Date | Subject | |
---|---|---|---|
Next Message | Gene Sokolov | 1999-11-05 15:38:18 | Re: [GENERAL] Stored Procedures |
Previous Message | Moray McConnachie | 1999-11-05 15:31:06 | Re: [GENERAL] Stored Procedures |
From | Date | Subject | |
---|---|---|---|
Next Message | Lamar Owen | 1999-11-05 16:33:36 | Re: [GENERAL] indexed regex select optimisation missing? |
Previous Message | Oleg Broytmann | 1999-11-05 14:32:49 | Re: [HACKERS] Encoding in UDF's |