Quick Links

Re: TSearch2 / Get all unique lexems

From:	Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: TSearch2 / Get all unique lexems
Date:	2005-12-08 08:50:28
Message-ID:	4397F3D4.8080004@theendofthetunnel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 07.12.2005 16:13, Oleg Bartunov wrote:
> hmm, you could dump tsvector column and use awk+sort+uniq

Thanks. I hoped for something possible inside a pl/pgsql proc. I'm
trying to integrate pg_trgm with Tsearch2. I'm still on my UTF-8
database. Yes I know, there is _NO_ UTF-8 support of any kind in
Tsearch2 yet, but I got it working to a degree that is OK for my
application (Created my own stemmer variant, ispell dict, affix file
etc). The last missing bit is to get a source for pg_trgm. I cannot use
the the stat() function, because it breaks as soon it sees an UTF-8 char.

I thought of using lexise(), cast the text array to rows somehow, write
it to a temp table, use SELECT DISTINCT.. but I hadn't any success yet.

--
Regards,
Hannes Dorbath

In response to

Re: TSearch2 / Get all unique lexems at 2005-12-07 15:13:07 from Oleg Bartunov

Responses

Re: TSearch2 / Get all unique lexems at 2005-12-08 11:00:55 from Teodor Sigaev
Re: TSearch2 / Get all unique lexems at 2005-12-08 11:04:03 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Peter Eisentraut	2005-12-08 09:52:18	Re: Help on collation and accent sensitivity
Previous Message	hubert depesz lubaczewski	2005-12-08 08:20:09	Re: tables with lots of columns - what alternative from performance point of view?