Re: Tsearch vector not stored by update/set

From: "Justin L(dot) Kennedy" <jk289(at)mail(dot)gatech(dot)edu>
To: "Andrew J(dot) Kopciuch" <akopciuch(at)bddf(dot)ca>
Cc: "Justin L(dot) Kennedy" <jk289(at)prism(dot)gatech(dot)edu>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch vector not stored by update/set
Date: 2005-03-21 22:22:32
Message-ID: Pine.SOL.4.33.0503211626030.16947-100000@acmez.gatech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, 20 Mar 2005, Andrew J. Kopciuch wrote:

> On Thursday 17 March 2005 17:55, you wrote:
> > The short question is why does this:
> >
> > select to_tsvector('default', coalesce(name, '') ||' '||
> > coalesce(description, '') ||' '|| coalesce(keywords,'')) from link_items;
> >
> > give different results than this:
> >
> > update link_items set linksfti=to_tsvector('default', coalesce(name, '')
> >
> > ||' '|| coalesce(description, '') ||' '|| coalesce(keywords,''));
> >
> > select linksfti from link_items;
> >
>
>
> It shouldn't. I just tested with some of your data on my machine, and the
> results were fine. (PG 7.4.6). I can not see why they would be any
> different for 7.3.2.
>
> Your data looks rather strange. (re "'60':1 '000':2"). Is that really all
> that was inserted? Or have you just left some out for your email?

About 95% of the 1900 insertions ended up with empty strings (not NULLs),
the other 5% looked like that above. Either just numbers, or occasionally
words (defined by consecutive non-whitespace characters separated by
whitespace) that had numbers or symbols in them. Like: "U.S. Senate" was
transformed in such a way that "Senate" was dropped completely and "U.S."
became lowercased "u.s.". Another example was a URL that happened to be
in the description column of one was captured, but the rest of the text
was not. Another had a name of "World T.E.A.M. Sports" and all that was
stored in the vector was "t.e.a.m."

It seems to be selective of only numbers, words with numbers in them,
words with '.' or '/' characters. It completely ignores any other words
or text in any of the 3 fields.

> I could see this being a configuration issue possibly. What do your pg_ts
> tables look like? Have you made modifications there?

This morning, I decided to remove the following trigger from the
link_items table:
CREATE TRIGGER updateprodtbl
BEFORE INSERT OR UPDATE
ON link_items
FOR EACH ROW
EXECUTE PROCEDURE tsearch2('linksfti', 'name', 'description',
'keywords');

Now the UPDATE command I listed above works, so apparently there is
something about this trigger that is blocking the search vector from being
stored. This trigger was copied and pasted (with only changes to the
column names) from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
in the INDEXING FIELDS IN A TABLE section.

This does fix the immediate problem of getting the search results for the
live website we are running, but now we have the problem of keeping it up
to date. I could run a nightly cronjob to update all the rows, but it
seems inefficient, and I would really like to know why the trigger would
keep it from working on 7.3, but not 8.0.

You requested the pg_ts_* tables:
On the Linux-redhat, pg7.3.2

pg_ts_cfg: (4 rows)
oid ts_name prs_name locale
106407 "default" "default" "C"
988004 "default_english" "default" "en_US"
106408 "default_russian" "default" "ru_RU.KOI8-R"
106409 "simple" "default" NULL

pg_ts_dict: (5 rows)
oid dict_name dict_initoption dict_init dict_lexize
106356 en_stem "" snb_en_init snb_lexize
106361 ispell_template NULL spell_init spell_lexize
106358 ru_stem "/usr/local/pgsql/share/contrib/russion.stop"
snb_ru_init snb_lexize
106353 simple NULL dex_init dex_lexize
106364 synonym NULL syn_init syn_lexize

pg_ts_parser: (1 row)
oid prs_name prs_start prs_nexttoken prs_end
prs_headline prs_lextype
106389 "default" prsd_start prsd_getlexeme prsd_end
prsd_headline prsd_lextype

pg_ts_cfgmap(73 rows)
ts_name tok_alias dict_name
"default" "lword" "{en_stem}"
"default" "nlword" "{simple}"
"default" "word" "{simple}"
"default" "email" "{simple}"
"default" "url" "{simple}"
"default" "host" "{simple}"
"default" "sfloat" "{simple}"
"default" "version" "{simple}"
"default" "part_hword" "{simple}"
"default" "nlpart_hword" "{simple}"
"default" "lpart_hword" "{en_stem}"
"default" "hword" "{simple}"
"default" "lhword" "{en_stem}"
"default" "nlhword" "{simple}"
"default" "uri" "{simple}"
"default" "file" "{simple}"
"default" "float" "{simple}"
"default" "int" "{simple}"
"default" "uint" "{simple}"
"default_russian" "lword" "{en_stem}"
"default_russian" "nlword" "{ru_stem}"
"default_russian" "word" "{ru_stem}"
"default_russian" "email" "{simple}"
"default_russian" "url" "{simple}"
"default_russian" "host" "{simple}"
"default_russian" "sfloat" "{simple}"
"default_russian" "version" "{simple}"
"default_russian" "part_hword" "{simple}"
"default_russian" "nlpart_hword" "{ru_stem}"
"default_russian" "lpart_hword" "{en_stem}"
"default_russian" "hword" "{ru_stem}"
"default_russian" "lhword" "{en_stem}"
"default_russian" "nlhword" "{ru_stem}"
"default_russian" "uri" "{simple}"
"default_russian" "file" "{simple}"
"default_russian" "float" "{simple}"
"default_russian" "int" "{simple}"
"default_russian" "uint" "{simple}"
"simple" "lword" "{simple}"
"simple" "nlword" "{simple}"
"simple" "word" "{simple}"
"simple" "email" "{simple}"
"simple" "url" "{simple}"
"simple" "host" "{simple}"
"simple" "sfloat" "{simple}"
"simple" "version" "{simple}"
"simple" "part_hword" "{simple}"
"simple" "nlpart_hword" "{simple}"
"simple" "lpart_hword" "{simple}"
"simple" "hword" "{simple}"
"simple" "lhword" "{simple}"
"simple" "nlhword" "{simple}"
"simple" "uri" "{simple}"
"simple" "file" "{simple}"
"simple" "float" "{simple}"
"simple" "int" "{simple}"
"simple" "uint" "{simple}"
"default_english" "url" "{simple}"
"default_english" "host" "{simple}"
"default_english" "sfloat" "{simple}"
"default_english" "uri" "{simple}"
"default_english" "int" "{simple}"
"default_english" "float" "{simple}"
"default_english" "email" "{simple}"
"default_english" "word" "{simple}"
"default_english" "hword" "{simple}"
"default_english" "nlword" "{simple}"
"default_english" "nlpart_hword" "{simple}"
"default_english" "part_hword" "{simple}"
"default_english" "nlhword" "{simple}"
"default_english" "file" "{simple}"
"default_english" "uint" "{simple}"
"default_english" "version" "{simple}"

On the 8.0.0-beta5 Windows 2000 machine:
pg_ts_cfg: (4 rows, identical to 7.3.2)
oid ts_name prs_name locale
370162 "default" "default" "C"
370165 "default_english" "default" "en_US"
370163 "default_russian" "default" "ru_RU.KOI8-R"
370164 "simple" "default" NULL

pg_ts_cfgmap (73 rows, identical to 7.3.2)
Not listed again, all identical except for OIDs

pg_ts_dict (5 rows, identical to 7.3.2 except for OIDS)

pg_ts_parser (1 row, identical to 7.3.2 except for OIDs)

I have made a single change to it from its default installation. When I
was working with the rank_cd() function on the 8.0.0 machine, it had
errors due to a non-existant english stop file, so I changed
pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'. The indexing
system was working fine both before and after the change to the pg_ts_dict
table. I also propagated the change to the 7.3.2 machine even though it
didn't have the error message (the stop file didn't exist on that computer
either, but it never gave an error message about it).

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Parker 2005-03-21 22:29:35 create or replace trigger?
Previous Message Paul Moore 2005-03-21 22:11:11 Re: New user: Windows, Postgresql, Python