Re: split string by special characters

From: Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
To: Jan-Erik <jan-erik(dot)larka(at)os2world(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: split string by special characters
Date: 2009-07-25 14:01:29
Message-ID: 4A6B1039.4040205@netzmeister-st-pauli.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jan-Erik schrieb:
> On 24 Juli, 23:22, a(dot)w(dot)(dot)(dot)(at)netzmeister-st-pauli(dot)de (Andreas Wenk)
> wrote:
>> Hi,
>>
>> I was thinking about that and in my opinion the approach to let the
>> database do that is the wrong direction. Sure you can do a lot with
>> regexp_split_to_table or regexp_split_to_array but they are kind of
>
> Yes, I see. You're quite right, the split was intended to do give me
> everything in processed chunks it in some easy way as the last part of
> the interpretation of the text.
>
>> limited compared to a programming language using regular expressions. If
>> I had to try to get your jobdone, I would try regexp_matches() like:
>>
>> SELECT regexp_matches('This is just a text, that contain special
>> characters such as, (comma),"(", ")" (left and right parenthesis) as
>> well as "?" question, mark.How do I split it up with PostgreSQL?',
>> E'(\\w*.)\\s+','g');
>>
>> regexp_matches
>> ----------------
>> {This}
>> {is}
>> {just}
>> {a}
>> {"text,"}
>> {that}
>> {contain}
>> {special}
>> {characters}
>> {such}
>> {"as,"}
>> {","}
>> {"\""}
>> {left}
>> {and}
>> {right}
>> {parenthesis)}
>> {as}
>> {well}
>> {as}
>> {"\""}
>> {"question,"}
>> {How}
>> {do}
>> {I}
>> {split}
>> {it}
>> {up}
>> {with}
>> (29 rows)
>>
>> So, you have the ability to catch the seperators like ','. But for now,
>> teh example just catches the comma. But you want to catch a lot of other
>
> Yes, but then I ran into the problems with separators that regexp
> consider as part of the expression and how to dynamically build the
> right expression in some unified way for each language.
>
>> seperators as well. I suggest you do that within the logic of your
>> coding language because I don't think this will be an easy way to walk
>
> Guess you're right, because I didn't know how to handle it with the
> regexp-approach.
> I sat down yesterday and wrote a function that does the job for me in
> PL/pgSQL, I'm not quite finished, but can see the light at the end of
> the tunnel.

just in case you are running into a black performance hole - you could
try to write it in C as a user defined function. Actually for me it
would be a real big challenge ;-)

> The basic approach I'm working with now is to let it find the position
> of each delimiter combination within the text, then sort the resulting
> array to get it ordered and extract each part.
> It won't be fast as lightning, but sufficient for now and as it seem,
> allow me to parse text from various files written in different
> languages (e.g. programming) just by specifying the delimiters.
>
>> ;-). This is no database job in my opinion.
>
> I didn't intend to try it either before I spotted some of those
> functions... :-)
> Then figured it would be nice to do it within the db-engine as all the
> data is present there.
> I wrote code outside the db-engine some time ago, but then other
> aspects made it less desirable to use.
>> Cheers
>>
>> Andy
>>
>
> Thank you Andy for the code example and your advice.
> I really appreciate that you took your time to show me how and explain
> why.
> //Jan-Erik

hey you're welcome. When you're done it would be really great to see the
resulting function ;-)

Cheers

Andy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2009-07-25 14:10:44 Re: Very slow joins
Previous Message Andreas Wenk 2009-07-25 13:53:15 Re: Disable databse listing for non-superuser (\l) ?