Re: back references using regex

From: Matthew Peter <survivedsushi(at)yahoo(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: back references using regex
Date: 2005-09-10 05:11:46
Message-ID: 20050910051146.77214.qmail@web35205.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------

(1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?

--- Michael Fuhr <mike(at)fuhr(dot)org> wrote:

> On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
> Peter wrote:
> > It's not a complex regex as I have wrote one that
> does
> > what I want, yet not at the database level. The
> docs
> > didn't help clarify anything. I'm still not clear
> on
> > how it determines where the back reference comes
> from
> > in the previous example you gave. And why digits
> > wouldn't work.
>
> Back references work as they usually do in regular
> expressions:
> they refer to the matched value of a previous
> parenthesized
> subexpression. If you have multiple open
> parentheses then you
> need to refer to the correct subexpression to get
> what you want.
> Example:
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES ('abc.foo.foo.xyz');
> INSERT INTO foo VALUES ('12.00.00.34');
> INSERT INTO foo VALUES ('abc.def.ghi');
>
> SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
> t
> -----------------
> abc.foo.foo.xyz
> 12.00.00.34
> (2 rows)
>
> In the above query the regular expression has only
> one set of
> parentheses, so the back reference refers to \1.
> The result set
> contains the two rows that have one or more word
> characters followed
> by a dot followed by the same set of characters.
>
> In the following query, note the difference between
> the regular
> expression in the select list and the one in the
> where clause:
>
> SELECT substring(t FROM $$((\w+)\.\2)$$)
> FROM foo
> WHERE t ~ $$(\w+)\.\1$$;
> substring
> -----------
> foo.foo
> 00.00
> (2 rows)
>
> In the regular expression in the select list, we use
> the outermost
> set of parentheses for grouping, so the back
> reference needs to
> refer to the subexpression that begins with the
> second open
> parenthesis (i.e., we must use \2). In the regular
> expression
> in the where clause, we have only one set of
> parentheses so the
> back reference is \1.
>
> Regarding digits, you didn't post any output in your
> example, so
> we don't know if it really doesn't work or if it
> just doesn't do
> what you were expecting. Here's what I get from
> your examples:
>
> SELECT substring('12.00.00.34' FROM
> $$((\d+)\.\2)$$);
> substring
> -----------
> 00.00
> (1 row)
>
> SELECT substring('12.00.00.34' FROM
> $$((\w+)\.\2)$$);
> substring
> -----------
> 00.00
> (1 row)
>
> SELECT substring('abc.foo.foo.xyz' FROM
> $$((\w+)\.\2)$$);
> substring
> -----------
> foo.foo
> (1 row)
>
> Do you get different results, or do these results
> surprise you?
> They all appear to be correct.
>
> > What I basically want to do is have a slice
> function
> > like Python, where I can slice out items from a
> \s, \.
> > or \n\n separated list. Where I'll just change the
> > delimiter for the query that it applies.
> >
> > Where I could start it at a certain point and end
> it
> > at another. Like slicing out paragraph 3-6
> (delimiter
> > \n\n) or the 2nd-6th sentence in a article
> (delimiter
> > \.). That is what I am trying to do.
>
> You can use split_part() to get a single item or
> string_to_array()
> to build an array from which you can extract
> multiple items.
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES
> ('one.two.three.four.five.six.');
>
> SELECT (string_to_array(t, '.'))[3:5] FROM foo;
> string_to_array
> -------------------
> {three,four,five}
> (1 row)
>
> SELECT array_to_string((string_to_array(t,
> '.'))[3:5], '.') FROM foo;
> array_to_string
> -----------------
> three.four.five
> (1 row)
>
> Is that what you're looking for?
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Janning Vygen 2005-09-10 09:23:50 ERROR: type "temp_gc" already exists
Previous Message Tatsuo Ishii 2005-09-10 03:08:25 Re: RAID0 and pg_xlog