Re: gsoc, text search selectivity and dllist enhancments

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, "Postgres - Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, text search selectivity and dllist enhancments
Date: 2008-07-04 15:53:56
Message-ID: 23365.1215186836@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> The data structure I'd suggest is a simple array of pointers
>> to the underlying hash table entries. Since you have a predetermined
>> maximum number of lexemes to track, you can just palloc the array once
>> --- you don't need the expansibility properties of a list.

> The number of lexemes isn't predetermined. It's 2 * (longest tsvector
> seen so far), and we don't know beforehand how long the longest tsvector is.

Hmm, I had just assumed without looking too closely that it was stats
target times a fudge factor. What is the rationale for doing it as
above? I don't think I like the idea of the limit varying over the
course of the scan --- that means that lexemes in different places
in the input will have significantly different probabilities of
surviving to the final result.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-04 16:01:12 Re: [PATCHES] Explain XML patch v2
Previous Message Alvaro Herrera 2008-07-04 15:05:40 Re: Review: DTrace probes (merged version)