From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
Cc: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, "Postgres - Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: gsoc, text search selectivity and dllist enhancments |
Date: | 2008-07-04 15:53:56 |
Message-ID: | 23365.1215186836@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> The data structure I'd suggest is a simple array of pointers
>> to the underlying hash table entries. Since you have a predetermined
>> maximum number of lexemes to track, you can just palloc the array once
>> --- you don't need the expansibility properties of a list.
> The number of lexemes isn't predetermined. It's 2 * (longest tsvector
> seen so far), and we don't know beforehand how long the longest tsvector is.
Hmm, I had just assumed without looking too closely that it was stats
target times a fudge factor. What is the rationale for doing it as
above? I don't think I like the idea of the limit varying over the
course of the scan --- that means that lexemes in different places
in the input will have significantly different probabilities of
surviving to the final result.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-07-04 16:01:12 | Re: [PATCHES] Explain XML patch v2 |
Previous Message | Alvaro Herrera | 2008-07-04 15:05:40 | Re: Review: DTrace probes (merged version) |