From: | Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de> |
---|---|
To: | Jan-Erik <jan-erik(dot)larka(at)os2world(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: split string by special characters |
Date: | 2009-07-25 14:01:29 |
Message-ID: | 4A6B1039.4040205@netzmeister-st-pauli.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Jan-Erik schrieb:
> On 24 Juli, 23:22, a(dot)w(dot)(dot)(dot)(at)netzmeister-st-pauli(dot)de (Andreas Wenk)
> wrote:
>> Hi,
>>
>> I was thinking about that and in my opinion the approach to let the
>> database do that is the wrong direction. Sure you can do a lot with
>> regexp_split_to_table or regexp_split_to_array but they are kind of
>
> Yes, I see. You're quite right, the split was intended to do give me
> everything in processed chunks it in some easy way as the last part of
> the interpretation of the text.
>
>> limited compared to a programming language using regular expressions. If
>> I had to try to get your jobdone, I would try regexp_matches() like:
>>
>> SELECT regexp_matches('This is just a text, that contain special
>> characters such as, (comma),"(", ")" (left and right parenthesis) as
>> well as "?" question, mark.How do I split it up with PostgreSQL?',
>> E'(\\w*.)\\s+','g');
>>
>> regexp_matches
>> ----------------
>> {This}
>> {is}
>> {just}
>> {a}
>> {"text,"}
>> {that}
>> {contain}
>> {special}
>> {characters}
>> {such}
>> {"as,"}
>> {","}
>> {"\""}
>> {left}
>> {and}
>> {right}
>> {parenthesis)}
>> {as}
>> {well}
>> {as}
>> {"\""}
>> {"question,"}
>> {How}
>> {do}
>> {I}
>> {split}
>> {it}
>> {up}
>> {with}
>> (29 rows)
>>
>> So, you have the ability to catch the seperators like ','. But for now,
>> teh example just catches the comma. But you want to catch a lot of other
>
> Yes, but then I ran into the problems with separators that regexp
> consider as part of the expression and how to dynamically build the
> right expression in some unified way for each language.
>
>> seperators as well. I suggest you do that within the logic of your
>> coding language because I don't think this will be an easy way to walk
>
> Guess you're right, because I didn't know how to handle it with the
> regexp-approach.
> I sat down yesterday and wrote a function that does the job for me in
> PL/pgSQL, I'm not quite finished, but can see the light at the end of
> the tunnel.
just in case you are running into a black performance hole - you could
try to write it in C as a user defined function. Actually for me it
would be a real big challenge ;-)
> The basic approach I'm working with now is to let it find the position
> of each delimiter combination within the text, then sort the resulting
> array to get it ordered and extract each part.
> It won't be fast as lightning, but sufficient for now and as it seem,
> allow me to parse text from various files written in different
> languages (e.g. programming) just by specifying the delimiters.
>
>> ;-). This is no database job in my opinion.
>
> I didn't intend to try it either before I spotted some of those
> functions... :-)
> Then figured it would be nice to do it within the db-engine as all the
> data is present there.
> I wrote code outside the db-engine some time ago, but then other
> aspects made it less desirable to use.
>> Cheers
>>
>> Andy
>>
>
> Thank you Andy for the code example and your advice.
> I really appreciate that you took your time to show me how and explain
> why.
> //Jan-Erik
hey you're welcome. When you're done it would be really great to see the
resulting function ;-)
Cheers
Andy
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2009-07-25 14:10:44 | Re: Very slow joins |
Previous Message | Andreas Wenk | 2009-07-25 13:53:15 | Re: Disable databse listing for non-superuser (\l) ? |