| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> | 
| Cc: | Edwin Groothuis <postgresql(at)mavetju(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org> | 
| Subject: | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit | 
| Date: | 2008-03-07 06:52:24 | 
| Message-ID: | 7543.1204872744@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-patches | 
Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> The problem with this approach is how to select the part of the document 
> to index. How will you ensure you're not ignoring the more important 
> words of the document?
That's *always* a risk, anytime you do any sort of processing or
normalization on the text.  The question here is not whether or not
we will make tradeoffs, only which ones to make.
> IMHO Postgres shouldn't decide it; it would be good if an user could set 
> it runtime and/or on postgresql.conf.
Well, there is exactly zero chance of that happening in 8.3.x, because
the bit allocations for on-disk tsvector representation are already
determined.  It's fairly hard to see a way of doing it in future
releases that would have acceptable costs, either.
But more to the point: no matter what the document length limit is,
why should it be a hard error to exceed it?  The downside of not
indexing words beyond the length limit is that searches won't find
documents in which the search terms occur only very far into the
document.  The downside of throwing an error is that we can't store such
documents at all, which surely guarantees that searches won't find
them.  How can you possibly argue that that option is better?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jan Strube | 2008-03-07 09:38:43 | BUG #4019: Comparison of user defined domain doesn't work | 
| Previous Message | Euler Taveira de Oliveira | 2008-03-07 06:14:09 | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2008-03-07 12:18:54 | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit | 
| Previous Message | Euler Taveira de Oliveira | 2008-03-07 06:14:09 | Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit |