From: | Sushant Sinha <sushant354(at)gmail(dot)com> |
---|---|
To: | "Massa, Harald Armin" <harald(at)2ndQuadrant(dot)de> |
Cc: | PGSQL Mailing List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: access to lexems or access to parsed elements |
Date: | 2011-08-25 16:41:16 |
Message-ID: | 1314290476.1846.13.camel@dragflick |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Can this fit?
select plainto_tsquery('english', 'the quick brown fox jumped over the
lazy fox');
plainto_tsquery
-----------------------------------------------------
'quick' & 'brown' & 'fox' & 'jump' & 'lazi' & 'fox'
-Sushant.
On Thu, 2011-08-25 at 18:21 +0200, Massa, Harald Armin wrote:
> I want to access the single words in a text. Better yet: the relevant
> words (i.e. without stop words) in a text.
>
>
> to_tsvector or casting gets me the lexems as a tsvector:
>
>
> select to_tsvector('the quick brown fox jumped over the lazy fox')
> ''brown':3 'fox':4,9 'jump':5 'lazi':8 'quick':2'
>
>
> And I would like to access "brown", "fox", "jump", "lazi" and "quick"
> as single values that I insert into another table.
>
>
> But: no luck with any tries to convert to records, arrays or similiar.
>
>
> Next step, the lesser-known-fts-functions:
>
>
> select ts_parse('default','the quick brown fox jumped over the lazy
> fox')
>
>
> (1,the)
> (12," ")
> (1,quick)
> [...]
> (1,fox)
>
>
> is a set-returning-function, giving me 17 records of type
> pseudo-record. Stopwords still in there, so what. But: No chance of
> accessing the second field in that record.
>
>
> Of course, there is allways:
>
>
> select substr(what::text,position(',' in
> what::text)+1,char_length(what::text)-position(',' in what::text)-1)
> from
> (
> select ts_parse('default','the quick brown fox jumped over the lazy
> fox') as what
> )x
>
>
> but, comeon: having a two-field-record, casting it to one field of
> text, searching for the "," that separates the two fields and then
> split the one-field into two fields by substring?
>
>
> So, is there a better way to access
>
>
> a) the lexems of a tsvector
> b) the (unnamed) fields of a set-of-record-returning function
>
>
> ?
> Harald
>
>
> --
> Harald Armin Massa www.2ndQuadrant.de
> PostgreSQL Training, Services and Support
>
> 2ndQuadrant Deutschland GmbH
> GF: Harald Armin Massa
> Amtsgericht Stuttgart, HRB 736399
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2011-08-25 17:52:33 | Re: rollback doubt and connection to remoteDB |
Previous Message | Massa, Harald Armin | 2011-08-25 16:21:21 | access to lexems or access to parsed elements |