From: | Stuart Woolford <stuartw(at)newmail(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu> |
Cc: | pgsql-general(at)postgreSQL(dot)org, Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing? |
Date: | 1999-11-06 00:05:14 |
Message-ID: | 99110613104200.00731@test.macmillan.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Firstly, damb you guys are good, please accept my strongest complements for the
response time on this issue!
On Sat, 06 Nov 1999, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu> writes:
> > Reviewing my email logs from June, most of the work on this has to do with
> > people who needs locales, and potentially multibyte character sets. Tom
> > Lane is of the opinion that this particular optimization needs to be moved
> > out of the parser, and deeper into the planner or optimizer/rewriter,
> > so a good fix may be some ways out.
>
> Actually, that part is already done: addition of the index-enabling
> comparisons is gone from the parser and is now done in the optimizer,
> which has a whole bunch of benefits (one being that the comparison
> clauses don't get added to the query unless they are actually used
> with an index!).
>
> But the underlying LOCALE problem still remains: I don't know a good
> character-set-independent method for generating a "just a little bit
> larger" string to use as the righthand limit. If anyone out there is
> an expert on foreign and multibyte character sets, some help would
> be appreciated. Basically, given that we know the LIKE or regex
> pattern can only match values beginning with FOO, we want to generate
> string comparisons that select out the range of values that begin with
> FOO (or, at worst, a slightly larger range). In USASCII locale it's not
> hard: you can do
> field >= 'FOO' AND field < 'FOP'
> but it's not immediately obvious how to make this idea work reliably
> in the presence of odd collation orders or multibyte characters...
how about something along the lines of:
file >='FOO' and field='FOO.*'
ie, terminate once the search fails on a match of the static left-hand-side
followed by anything (although I have the feeling this does not fit into your
execution system..), and a simple regex type check be added to the scan
validation code?
>
> BTW: the \377 hack is actually wrong for USASCII too, since it'll
> exclude a data value like 'FOO\377x' which should be included.
That's why I pointed out that in my particular case, I only have alpha and
numeric data in the database, so it is safe, it's certainly no general solution.
--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | José A. Navarro =?iso-8859-1?Q?Ram=F3n?= | 1999-11-06 02:04:17 | subscribe pgsql-general |
Previous Message | Jeff MacDonald | 1999-11-05 23:39:00 | Banner (fwd) |
From | Date | Subject | |
---|---|---|---|
Next Message | Massimo Dal Zotto | 1999-11-06 00:45:48 | Re: [HACKERS] ERROR: infinite recursion in proc_exit |
Previous Message | Tom Lane | 1999-11-05 16:46:36 | Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing? |