Re: split string by special characters

From: Jan-Erik <jan-erik(dot)larka(at)os2world(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: split string by special characters
Date: 2009-07-29 05:06:46
Message-ID: 1c319957-477d-43d7-8041-498479ef36b4@d4g2000yqa.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Current design of function
---text_to_array( sen text, col collection )---
DECLARE
delimiter_pos int[] := '{}';
return_val text[] := '{}';
delimiters_chr RECORD;
remaining_text text;
delimiter_curr text;
i int := 0;
j int := 0;
prev_pos int := 1;
curr_pos int := 0;
delimiter_len int;
BEGIN
FOR delimiters_chr IN SELECT upper( chars.chars ) AS chars FROM
chars, word WHERE chars.chars_id = word.chars_id AND
word.classification_id = clss.classification_id LOOP
delimiter_len := character_length( delimiters_chr.chars ); --
Length of the current delimiter we're looking at
IF prev_pos > 0 THEN -- Don't add text if it hasn't been altered
(previous delimiter not found)
remaining_text := upper( sen ); -- Temporary copy of text to
search through
END IF;
prev_pos := 0;
curr_pos := position( delimiters_chr.chars IN sen ); -- Delimiter
position in text (sentence)

WHILE curr_pos > 0 LOOP -- Found delimiter in text
delimiter_pos := array_append( delimiter_pos, curr_pos +
prev_pos ); -- Add position of delimiter
delimiter_pos := array_append( delimiter_pos, curr_pos +
prev_pos + delimiter_len ); -- Add position where delimiter ends
remaining_text := substring( remaining_text FROM curr_pos +
delimiter_len ); -- Look only at text still
prev_pos := curr_pos + prev_pos; -- Need to keep track of how
much text has been stripped off
curr_pos := position( delimiters_chr.chars IN remaining_text );
-- Calculate the next occurance
END LOOP;
END LOOP;
remaining_text := sen; -- Temporary copy of text to extract text
from
SELECT DISTINCT * INTO delimiter_pos FROM array_append( array_prepend
( 1, sort( delimiter_pos ) ), character_length( sen ) + 1 ); -- Add
first character pos and last to the sorted array.
curr_pos := array_upper( delimiter_pos, 1 ); -- Calculate the size
of the array
WHILE j < curr_pos LOOP -- Look through the array
j := i + 1;
return_val := array_append( return_val, substring
( remaining_text FROM delimiter_pos[i] FOR (delimiter_pos[j] -
delimiter_pos[i]) ) ); -- Add the parts to a new array with text
i := i + 1;
END LOOP;
RETURN return_val; -- Return the result
END;
----
Need an additional function as well, to sort the array.
The delimiters has to be placed in another table.

Please note that it can't handle variations to a delimiter such as
<table style="..."> due to the present design as I couldn't use
regexp. <table> (case doesn't matter) without any other extras, work
fine though. I initially hoped it would be possible to allow matchings
such as <td(.*)> as well as pairs <td>(.*)</td> and variations
thereof, such as <td(.*)>(.*)</td>, but that's something I need a bit
of help with (hint, hint).

Written to run on eComStation v2.0 (successor of OS/2)
//Jan-Erik

In response to

Browse pgsql-general by date

  From Date Subject
Next Message tomrevam 2009-07-29 05:19:34 Re: synchronous_commit=off doesn't always return immediately
Previous Message Scott Marlowe 2009-07-29 01:52:22 Re: Idle processes chewing up CPU?