From: | Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de> |
---|---|
To: | Jan-Erik <jan-erik(dot)larka(at)os2world(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: split string by special characters |
Date: | 2009-07-24 21:22:55 |
Message-ID: | 4A6A262F.3050506@netzmeister-st-pauli.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Jan-Erik wrote:
> I wonder if you could please help me out to extract a character string
> to an array or better yet, a table.
>
> I'd like to split strings of text up into words and delimiters (but
> not delete the delimiters). The delimiters are defined as comma,
> space, dot, singe/double quotation mark, question mark etc.¹ in a
> separate table (delimiters) depending on what rules apply for the
> input.
>
> regexp_split_to_array/table seem quite suitable but I have
> difficulties to form the right expression with it, apart from that it
> remove the delimiters as well.
>
> Example:
> This is just a text that contain special characters such as , (comma),
> "(", ")" (left and right parenthesis) as well as "?" question mark.
> How do I split it up with PostgreSQL?
>
> Expected result:
> {This, " ", is, " ", just, " ", a, ..., PostgreSQL, "?" }
> __________________
> ¹) Also later on tags such as <html> and at other times something
> else depending on the circumstances.
>
> //Jan-Erik
>
Hi,
I was thinking about that and in my opinion the approach to let the
database do that is the wrong direction. Sure you can do a lot with
regexp_split_to_table or regexp_split_to_array but they are kind of
limited compared to a programming language using regular expressions. If
I had to try to get your jobdone, I would try regexp_matches() like:
SELECT regexp_matches('This is just a text, that contain special
characters such as, (comma),"(", ")" (left and right parenthesis) as
well as "?" question, mark.How do I split it up with PostgreSQL?',
E'(\\w*.)\\s+','g');
regexp_matches
----------------
{This}
{is}
{just}
{a}
{"text,"}
{that}
{contain}
{special}
{characters}
{such}
{"as,"}
{","}
{"\""}
{left}
{and}
{right}
{parenthesis)}
{as}
{well}
{as}
{"\""}
{"question,"}
{How}
{do}
{I}
{split}
{it}
{up}
{with}
(29 rows)
So, you have the ability to catch the seperators like ','. But for now,
teh example just catches the comma. But you want to catch a lot of other
seperators as well. I suggest you do that within the logic of your
coding language because I don't think this will be an easy way to walk
;-). This is no database job in my opinion.
Cheers
Andy
From | Date | Subject | |
---|---|---|---|
Next Message | Brian A. Seklecki | 2009-07-24 23:02:18 | Disable databse listing for non-superuser (\l) ? |
Previous Message | Peter Hunsberger | 2009-07-24 21:13:00 | Find difference between two Text fields |