syntax for reaching into records, specifically ts_stat results

From: Dan Chak <chak(at)MIT(dot)EDU>
To: pgsql-hackers(at)postgresql(dot)org
Subject: syntax for reaching into records, specifically ts_stat results
Date: 2008-12-09 18:55:10
Message-ID: 72A5BDAC-44DC-4EE2-8677-E1834C800D69@mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Postgres Folk,

In working with tsvectors (added in 8.3), I've come to a place where
my syntax-fu has failed me. I've resorted to turning a result set of
records into strings so that I can regexp out the record fields I
need, rather than access them directly, as I'm sure it's possible to
do with the right syntactic formulation. Although my solution works,
I'm sure it's much less efficient than it could be, and hope someone
on the list can help do this the right way.

Basically, I would like to transpose a series of tsvectors (one per
row) into columns. E.g., as tsvects, I have this:

test=# select * from tsvects;
sentence_id | tsvect
-------------+------------------------------
1 | 'fox':3 'brown':2 'quick':1
2 | 'lazi':1 'eleph':3 'green':2

Instead I want this:

sentence_id | word | freq
-------------+-------+------
1 | fox | 1
1 | brown | 1
1 | quick | 1
2 | lazi | 1
2 | eleph | 1
2 | green | 1

I am able to generate this with the following view, but the problem is
that to create it, I must first cast the ts_stat results to a string,
and then regexp out the pertinent pieces:

create or replace view words as
select sentence_id,
substring(stat from '^\\(([^,]+),') as word,
substring(stat from ',([^,]+)\\)$') as freq
from (select sentence_id,
ts_stat('select tsvect from tsvects where sentence_id
= ' ||
tsvects.sentence_id)::text as stat
from tsvects
) as foo;

It seems like there should be a way to access fields in the records
returned from ts_stat directly, but I can't figure out how. Here's
the result of the subquery:

test=# select sentence_id,
ts_stat('select tsvect from tsvects where sentence_id
= ' ||
tsvects.sentence_id)::text as stat
from tsvects;
sentence_id | stat
-------------+-------------
1 | (fox,1,1)
1 | (brown,1,1)
1 | (quick,1,1)
2 | (lazi,1,1)
2 | (eleph,1,1)
2 | (green,1,1)
(6 rows)

If I try to get at the elements (which I believe are named 'word',
'ndoc', 'nentry'), I get a variety of syntax errors:

test=# select sentence_id,
test-# stat['word'],
test-# stat['nentry']
test-# from (select sentence_id,
test(# ts_stat('select tsvect from tsvects where
sentence_id = ' ||
test(# tsvects.sentence_id) as stat
test(# from tsvects
test(# ) as foo;
ERROR: cannot subscript type record because it is not an array

If I say stat.word (instead of subscripting), I get 'missing FROM-
clause entry for table "stat"'. If I say foo.stat.word, I get
'ERROR: schema "foo" does not exist'.

Any ideas on how to get into these records with resorting to text
parsing?

Thanks,
Dan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-12-09 19:31:05 Re: WIP: default values for function parameters
Previous Message Andrew Chernow 2008-12-09 18:50:27 Re: parallel restore vs. windows