Re: counting words in a text or char varying field

From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: counting words in a text or char varying field
Date: 2002-03-04 22:50:34
Message-ID: E16i19A-0001EG-00@granger.mail.mindspring.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> i'm looking for a way to count the words in text field or char varying
> separated by white spaces within postgresql itself.

That depends on what you mean by "postgresql itself". You cannot do this
with ordinary SQL: you are going to need a procedural language. Perl
would make an excellent choice, but that would be too easy :), so I'll
try and come up with a quick plpgsql example. This is only a test script,
and 'whitespace' in this case means, literally, one or more space
characters. It's a start. :)

CREATE FUNCTION countwords(TEXT) RETURNS INTEGER AS '
DECLARE
mystring ALIAS FOR $1;
words INTEGER := 0;
inspace BOOL := true; -- ## Catch the first word
BEGIN
FOR pos IN 1 .. CHAR_LENGTH(mystring) LOOP

-- There are 2 single quotes, a space, and 2 single quotes below:
IF SUBSTRING(mystring,pos,1) = '' '' THEN
inspace := true;
ELSE
IF inspace THEN
inspace := false;
words := words+1;
END IF;
END IF;
END LOOP;
IF inspace is false THEN -- ## Catch the last word
words := words+1;
END IF;

RETURN words;
END;
' LANGUAGE 'plpgsql';

Let's give it a whirl:

select to_char(cdate, 'YYYYMMDD'), countwords(body)
from turnstep_mail where whofrom like '%tom(at)minnesota(dot)com%'
order by 1 desc limit 5;

to_char | countwords
- ----------+------------
20020304 | 493
20020304 | 35
20020303 | 194
20020303 | 74
20020302 | 44

Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200203041730

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE8g/m/vJuQZxSWSsgRAqpfAKDSh7WYjF6l3cWfIw+8O/UPeKgjmwCfRlsc
OI69IIGHlMKMIbR0+Mc/q/g=
=0ral
-----END PGP SIGNATURE-----

Browse pgsql-general by date

  From Date Subject
Next Message Patrick L. Nolan 2002-03-04 22:51:53 What are functional indices good for?
Previous Message Jean-Luc Lachance 2002-03-04 22:31:12 Re: Validating Whole Numbers