From: | will trillich <will(at)serensoft(dot)com> |
---|---|
To: | pgsql <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Regular Expression INTRO [long] [OT] |
Date: | 2002-04-10 21:38:54 |
Message-ID: | 20020410163854.A5944@serensoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Apr 09, 2002 at 06:05:43PM -0600, will trillich wrote:
[snip]
> With LIKE, your search values can use the percent (%)
> to tell PostgreSQL that anything can occupy that spot
> -- one character, twelve symbols or zero digits -- and
> still satisfy the search.
>
> On Unix or Linux, this is basically the same as the
> ASTERISK (*) at the command line, when dealing with
> file names:
>
> # list all files whose names begin with '.bash'
> ls .bash*
>
> # list all files containing 'out' anywhere in the name
> ls *out*
>
> # list all file names ending with '.pl'
> ls *.pl
>
> # list file starting with 'proj', ending with '.c'
> ls proj*.c
i'm torn between leaving in a full set of shell-glob samples to
pair up with postgres-like samples, and using only one (as
recommended by Holger Klawitter -- thanks!)...
> With PostgreSQL using the LIKE operator, use the
> percent, instead:
>
> -- list all customers within the 47610 postal code
> SELECT * FROM cust WHERE zip LIKE '47610%';
>
> -- display customers who have 'Corp' in their names
> SELECT * FROM cust WHERE name LIKE '%Corp%';
>
> -- show customers whose names end in 'LLC'
> SELECT * FROM cust WHERE name LIKE '%LLC';
>
> -- documents beginning with 'We', ending with 'rica'
> SELECT * FROM doc WHERE contents LIKE 'We%rica';
>
> Wherever the '%' appears (using the LIKE operator)
> Postgres allows anything at all to match -- from a
> lengthy string of text, to one single character, to
> a zero-length string -- i.e. nothing at all.
>
> ...ILIKE 'A%Z'
> -- matches 'AZ' and 'A to Z' and 'ABC2468XYZ' and 'A(at)$*Z'
whoops -- this shoulda been LIKE, not ILIKE (at least not yet)
[snip]
> So what's with the BACKSLASH in front of the dot? Well,
> just as LIKE has the UNDERSCORE (_) to denote "any single
> character", REGULAR EXPRESSIONS use the DOT (.) for
> that very same purpose. So we have to "escape" the
> dot to alter its normal interpretation, using the.
using the what? um, "...using the backslash." oops!
[snip]
> And for very common cases, there are handy abbreviations:
>
> 'x{1,1}' 'x' one (the default)
> 'r{0,1}' 'r?' zero or one, i.e. 'x' is optional
should be "'r' is optional". (thanks, Holger)
> 'B{0,}' 'B*' zero or more
> 'z{1,}' 'z+' one or more
may as well flesh these out, to be parallel with the others:
'B{0,}' 'B*' zero or more B's
'z{1,}' 'z+' one or more z's
[snip]
> Note that we've prefixed the previous pattern with
>
> '[^0-9]'
>
> because, within the square brackets of a character class,
> the CARAT (^) means "anything EXCEPT..."
>
> So now, 00000-0000 will NOT match; this is what we're
> after.
maybe i could be more clear, by saying:
So now, our pattern will NOT match something like
00000-0000, and since we don't want it to, we're making
progress.
[snip]
> And, just as in the LIKE operator, unless regular expressions
> ARE anchored at the beginning of a field, you'll defeat any
> index you have for that field. Indexes help alphabetize by
> comparing the beginning of your fields to each other; unless
> you're looking at the beginning of the field, your index can't
> be used.
some of you asked if regexes are actually able to use the index.
well, here's the answer: if they're anchored at the front, YES!
EXPLAIN
repo=# explain
repo-# select * from _table_1015197075 where str ~ '^f';
NOTICE: QUERY PLAN:
Index Scan using _table_ix on _table_1015197075 (cost=0.00..2.01 rows=1 width=24)
EXPLAIN
repo=# explain
repo-# select * from _table_1015197075 where str ~ 'f';
NOTICE: QUERY PLAN:
Seq Scan on _table_1015197075 (cost=100000000.00..100000001.03 rows=1 width=24)
to be honest, i "SET ENABLE_SEQSCAN = false" and created a
quickie table with about fifty rows, with an index on the "str"
column.
with '^f' it DID use the index; without the carat, 'f', it
absolutely COULD NOT use the index... imagine looking for words
in the dictionary that /contain/ the letter 'f', right? (no
mystery here -- but whether the '^f' COULD use the index was the
question, of course -- and it did).
cool. (i suspected, but wasn't certain.)
[snip]
> There are ways to anchor your searches to word boundaries,
> as well -- not just beginning-of-field and end-of-field. See
> your documentation for details.
rewording is in order here -- how about:
There are ways to anchor your searches to word boundaries,
as well -- you're not restricted to testing only for
beginning-of-field or end-of-field. (In fact, that's a more
likely solution to the problem than what I devised here.)
See your documentation for information on [[:<:]] and
[[:>:]].
[snip]
> As for [1] email stuff, it can be (zero or more of):
>
> '[\\-\\.]'
>
> hyphen or dot, followed by
>
> '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
> '+'
>
> one or more times.
how about a little added explanation for clarity?
This means an address (before the @domain.com) can be
alphanumerics only, or if it contains a dot or hyphen that
more alphanumerics must follow the dot or hyphen. So 'me'
and 'my.self' and 'albert.einstein-newton-john.jr' are all
acceptable to this part of the pattern.
> And as for [2] site stuff, it can be (one or more of):
>
> '[\\-\\.]'
>
> hyphen or dot, followed by
>
> '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
> '+'
>
> one or more times.
more clarity -- something like:
So the site stuff means that the domain (@some.where) MUST
contain at least one dot or hyphen, each of which must be
followed by alphanumerics. These would NOT match:
'me..you' -- no alphanumerics after first dot
'oops-' -- no alphanumerics after hyphen
'-eesh' -- need alphanumerics before hyphen
To be strict, that last one WILL match the 'site stuff'
fragment of the pattern, but it'll fail in the whole pattern,
because after the '@' we require SOME alphanumerics.
feel free to post comments/feedback to pgsql-user...
--
"We will fight them on the beaches, we will fight them on the
sons of beaches" -- Miguel Churchill, Winston's bastard Mexican
brother.
--lifted from http://www.astray.com/acmemail/stable/documentation.xml
will(at)serensoft(dot)com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!
From | Date | Subject | |
---|---|---|---|
Next Message | Damon Cokenias | 2002-04-10 21:40:15 | View INSERTs not returning OID in Postgres 7.2.1 |
Previous Message | Bruce Momjian | 2002-04-10 21:37:56 | Re: refcursor returned by pl/psql to jdbc |