| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)postgreSQL(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru> | 
| Subject: | tsearch patch and namespace pollution | 
| Date: | 2007-08-17 01:09:26 | 
| Message-ID: | 25419.1187312966@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I find the following additions to pg_proc in the current tsearch2 patch:
                   proc                   | prorettype 
------------------------------------------+------------
 pg_ts_parser_is_visible(oid)             | boolean
 pg_ts_dict_is_visible(oid)               | boolean
 pg_ts_template_is_visible(oid)           | boolean
 pg_ts_config_is_visible(oid)             | boolean
 tsvectorin(cstring)                      | tsvector
 tsvectorout(tsvector)                    | cstring
 tsvectorsend(tsvector)                   | bytea
 tsqueryin(cstring)                       | tsquery
 tsqueryout(tsquery)                      | cstring
 tsquerysend(tsquery)                     | bytea
 gtsvectorin(cstring)                     | gtsvector
 gtsvectorout(gtsvector)                  | cstring
 tsvector_lt(tsvector,tsvector)           | boolean
 tsvector_le(tsvector,tsvector)           | boolean
 tsvector_eq(tsvector,tsvector)           | boolean
 tsvector_ne(tsvector,tsvector)           | boolean
 tsvector_ge(tsvector,tsvector)           | boolean
 tsvector_gt(tsvector,tsvector)           | boolean
 tsvector_cmp(tsvector,tsvector)          | integer
 length(tsvector)                         | integer
 strip(tsvector)                          | tsvector
 setweight(tsvector,"char")               | tsvector
 tsvector_concat(tsvector,tsvector)       | tsvector
 vq_exec(tsvector,tsquery)                | boolean
 qv_exec(tsquery,tsvector)                | boolean
 tt_exec(text,text)                       | boolean
 ct_exec(character varying,text)          | boolean
 tq_exec(text,tsquery)                    | boolean
 cq_exec(character varying,tsquery)       | boolean
 tsquery_lt(tsquery,tsquery)              | boolean
 tsquery_le(tsquery,tsquery)              | boolean
 tsquery_eq(tsquery,tsquery)              | boolean
 tsquery_ne(tsquery,tsquery)              | boolean
 tsquery_ge(tsquery,tsquery)              | boolean
 tsquery_gt(tsquery,tsquery)              | boolean
 tsquery_cmp(tsquery,tsquery)             | integer
 tsquery_and(tsquery,tsquery)             | tsquery
 tsquery_or(tsquery,tsquery)              | tsquery
 tsquery_not(tsquery)                     | tsquery
 tsq_mcontains(tsquery,tsquery)           | boolean
 tsq_mcontained(tsquery,tsquery)          | boolean
 numnode(tsquery)                         | integer
 querytree(tsquery)                       | text
 rewrite(tsquery,tsquery,tsquery)         | tsquery
 rewrite(tsquery,text)                    | tsquery
 rewrite_accum(tsquery,tsquery[])         | tsquery
 rewrite_finish(tsquery)                  | tsquery
 rewrite(tsquery[])                       | tsquery
 stat(text)                               | record
 stat(text,text)                          | record
 rank(real[],tsvector,tsquery,integer)    | real
 rank(real[],tsvector,tsquery)            | real
 rank(tsvector,tsquery,integer)           | real
 rank(tsvector,tsquery)                   | real
 rank_cd(real[],tsvector,tsquery,integer) | real
 rank_cd(real[],tsvector,tsquery)         | real
 rank_cd(tsvector,tsquery,integer)        | real
 rank_cd(tsvector,tsquery)                | real
 token_type(oid)                          | record
 token_type(text)                         | record
 parse(oid,text)                          | record
 parse(text,text)                         | record
 lexize(oid,text)                         | text[]
 lexize(text,text)                        | text[]
 headline(oid,text,tsquery,text)          | text
 headline(oid,text,tsquery)               | text
 headline(text,text,tsquery,text)         | text
 headline(text,text,tsquery)              | text
 headline(text,tsquery,text)              | text
 headline(text,tsquery)                   | text
 to_tsvector(oid,text)                    | tsvector
 to_tsvector(text,text)                   | tsvector
 to_tsquery(oid,text)                     | tsquery
 to_tsquery(text,text)                    | tsquery
 plainto_tsquery(oid,text)                | tsquery
 plainto_tsquery(text,text)               | tsquery
 to_tsvector(text)                        | tsvector
 to_tsquery(text)                         | tsquery
 plainto_tsquery(text)                    | tsquery
 tsvector_update_trigger()                | trigger
 get_ts_config_oid(text)                  | oid
 get_current_ts_config()                  | oid
(82 rows)
(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)
While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as "parse" or "stat"
with argument types as generic as "text".  I think we need to put
a "ts_" prefix on some of these.  Specifically, I find these names
totally unacceptable without a ts_ prefix:
 stat(text)                               | record
 stat(text,text)                          | record
 token_type(oid)                          | record
 token_type(text)                         | record
 parse(oid,text)                          | record
 parse(text,text)                         | record
 lexize(oid,text)                         | text[]
 lexize(text,text)                        | text[]
These guys might be all right given that some of their arguments are
tsvector or tsquery, but it's not completely convincing --- think about
the case where an argument is given as an undecorated literal string.
It's also not all that clear that they are related to text searching.
I'm for putting a ts_ prefix on them too:
 rank(real[],tsvector,tsquery,integer)    | real
 rank(real[],tsvector,tsquery)            | real
 rank(tsvector,tsquery,integer)           | real
 rank(tsvector,tsquery)                   | real
 rank_cd(real[],tsvector,tsquery,integer) | real
 rank_cd(real[],tsvector,tsquery)         | real
 rank_cd(tsvector,tsquery,integer)        | real
 rank_cd(tsvector,tsquery)                | real
 rewrite(tsquery,tsquery,tsquery)         | tsquery
 rewrite(tsquery,text)                    | tsquery
 rewrite_accum(tsquery,tsquery[])         | tsquery
 rewrite_finish(tsquery)                  | tsquery
 rewrite(tsquery[])                       | tsquery
 headline(oid,text,tsquery,text)          | text
 headline(oid,text,tsquery)               | text
 headline(text,text,tsquery,text)         | text
 headline(text,text,tsquery)              | text
 headline(text,tsquery,text)              | text
 headline(text,tsquery)                   | text
These guys are just plain badly named, as it's completely unobvious that
they have anything to do with tsearch (or what they do at all, actually).
Furthermore the "varchar" variants seem entirely redundant with the
"text" ones:
 vq_exec(tsvector,tsquery)                | boolean
 qv_exec(tsquery,tsvector)                | boolean
 tt_exec(text,text)                       | boolean
 ct_exec(character varying,text)          | boolean
 tq_exec(text,tsquery)                    | boolean
 cq_exec(character varying,tsquery)       | boolean
Comments, suggestions?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2007-08-17 01:17:02 | tsvector_update_trigger() is utterly insecure | 
| Previous Message | Marc G. Fournier | 2007-08-16 23:59:57 | Re: [HACKERS] Re: cvsweb busted (was Re: pgsql: Repair problems occurring when multiple RI updates have to be) |