tsearch patch and namespace pollution

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: tsearch patch and namespace pollution
Date: 2007-08-17 01:09:26
Message-ID: 25419.1187312966@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I find the following additions to pg_proc in the current tsearch2 patch:

proc | prorettype
------------------------------------------+------------
pg_ts_parser_is_visible(oid) | boolean
pg_ts_dict_is_visible(oid) | boolean
pg_ts_template_is_visible(oid) | boolean
pg_ts_config_is_visible(oid) | boolean
tsvectorin(cstring) | tsvector
tsvectorout(tsvector) | cstring
tsvectorsend(tsvector) | bytea
tsqueryin(cstring) | tsquery
tsqueryout(tsquery) | cstring
tsquerysend(tsquery) | bytea
gtsvectorin(cstring) | gtsvector
gtsvectorout(gtsvector) | cstring
tsvector_lt(tsvector,tsvector) | boolean
tsvector_le(tsvector,tsvector) | boolean
tsvector_eq(tsvector,tsvector) | boolean
tsvector_ne(tsvector,tsvector) | boolean
tsvector_ge(tsvector,tsvector) | boolean
tsvector_gt(tsvector,tsvector) | boolean
tsvector_cmp(tsvector,tsvector) | integer
length(tsvector) | integer
strip(tsvector) | tsvector
setweight(tsvector,"char") | tsvector
tsvector_concat(tsvector,tsvector) | tsvector
vq_exec(tsvector,tsquery) | boolean
qv_exec(tsquery,tsvector) | boolean
tt_exec(text,text) | boolean
ct_exec(character varying,text) | boolean
tq_exec(text,tsquery) | boolean
cq_exec(character varying,tsquery) | boolean
tsquery_lt(tsquery,tsquery) | boolean
tsquery_le(tsquery,tsquery) | boolean
tsquery_eq(tsquery,tsquery) | boolean
tsquery_ne(tsquery,tsquery) | boolean
tsquery_ge(tsquery,tsquery) | boolean
tsquery_gt(tsquery,tsquery) | boolean
tsquery_cmp(tsquery,tsquery) | integer
tsquery_and(tsquery,tsquery) | tsquery
tsquery_or(tsquery,tsquery) | tsquery
tsquery_not(tsquery) | tsquery
tsq_mcontains(tsquery,tsquery) | boolean
tsq_mcontained(tsquery,tsquery) | boolean
numnode(tsquery) | integer
querytree(tsquery) | text
rewrite(tsquery,tsquery,tsquery) | tsquery
rewrite(tsquery,text) | tsquery
rewrite_accum(tsquery,tsquery[]) | tsquery
rewrite_finish(tsquery) | tsquery
rewrite(tsquery[]) | tsquery
stat(text) | record
stat(text,text) | record
rank(real[],tsvector,tsquery,integer) | real
rank(real[],tsvector,tsquery) | real
rank(tsvector,tsquery,integer) | real
rank(tsvector,tsquery) | real
rank_cd(real[],tsvector,tsquery,integer) | real
rank_cd(real[],tsvector,tsquery) | real
rank_cd(tsvector,tsquery,integer) | real
rank_cd(tsvector,tsquery) | real
token_type(oid) | record
token_type(text) | record
parse(oid,text) | record
parse(text,text) | record
lexize(oid,text) | text[]
lexize(text,text) | text[]
headline(oid,text,tsquery,text) | text
headline(oid,text,tsquery) | text
headline(text,text,tsquery,text) | text
headline(text,text,tsquery) | text
headline(text,tsquery,text) | text
headline(text,tsquery) | text
to_tsvector(oid,text) | tsvector
to_tsvector(text,text) | tsvector
to_tsquery(oid,text) | tsquery
to_tsquery(text,text) | tsquery
plainto_tsquery(oid,text) | tsquery
plainto_tsquery(text,text) | tsquery
to_tsvector(text) | tsvector
to_tsquery(text) | tsquery
plainto_tsquery(text) | tsquery
tsvector_update_trigger() | trigger
get_ts_config_oid(text) | oid
get_current_ts_config() | oid
(82 rows)

(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)

While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as "parse" or "stat"
with argument types as generic as "text". I think we need to put
a "ts_" prefix on some of these. Specifically, I find these names
totally unacceptable without a ts_ prefix:

stat(text) | record
stat(text,text) | record

token_type(oid) | record
token_type(text) | record

parse(oid,text) | record
parse(text,text) | record

lexize(oid,text) | text[]
lexize(text,text) | text[]

These guys might be all right given that some of their arguments are
tsvector or tsquery, but it's not completely convincing --- think about
the case where an argument is given as an undecorated literal string.
It's also not all that clear that they are related to text searching.
I'm for putting a ts_ prefix on them too:

rank(real[],tsvector,tsquery,integer) | real
rank(real[],tsvector,tsquery) | real
rank(tsvector,tsquery,integer) | real
rank(tsvector,tsquery) | real
rank_cd(real[],tsvector,tsquery,integer) | real
rank_cd(real[],tsvector,tsquery) | real
rank_cd(tsvector,tsquery,integer) | real
rank_cd(tsvector,tsquery) | real

rewrite(tsquery,tsquery,tsquery) | tsquery
rewrite(tsquery,text) | tsquery
rewrite_accum(tsquery,tsquery[]) | tsquery
rewrite_finish(tsquery) | tsquery
rewrite(tsquery[]) | tsquery

headline(oid,text,tsquery,text) | text
headline(oid,text,tsquery) | text
headline(text,text,tsquery,text) | text
headline(text,text,tsquery) | text
headline(text,tsquery,text) | text
headline(text,tsquery) | text

These guys are just plain badly named, as it's completely unobvious that
they have anything to do with tsearch (or what they do at all, actually).
Furthermore the "varchar" variants seem entirely redundant with the
"text" ones:

vq_exec(tsvector,tsquery) | boolean
qv_exec(tsquery,tsvector) | boolean
tt_exec(text,text) | boolean
ct_exec(character varying,text) | boolean
tq_exec(text,tsquery) | boolean
cq_exec(character varying,tsquery) | boolean

Comments, suggestions?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-08-17 01:17:02 tsvector_update_trigger() is utterly insecure
Previous Message Marc G. Fournier 2007-08-16 23:59:57 Re: [HACKERS] Re: cvsweb busted (was Re: pgsql: Repair problems occurring when multiple RI updates have to be)