From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: gsoc, text search selectivity and dllist enhancments |
Date: | 2008-07-10 22:19:36 |
Message-ID: | 19287.1215728376@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> writes:
> Tom Lane wrote:
>> The way I think it ought to work is that the number of lexemes stored in
>> the final pg_statistic entry is statistics_target times a constant
>> (perhaps 100). I don't like having it vary depending on tsvector width
> I think the existing code puts at most statistics_target elements in a
> pg_statistic tuple. In compute_minimal_stats() num_mcv starts with
> stats->attr->attstattarget and is adjusted only downwards.
> My original thought was to keep that property for tsvectors (i.e. store
> at most statistics_target lexemes) and advise people to set it high for
> their tsvector columns (e.g. 100x their default).
Well, (1) the normal measure would be statistics_target *tsvectors*,
and we'd have to translate that to lexemes somehow; my proposal is just
to use a fixed constant instead of tsvector width as in your original
patch. And (2) storing only statistics_target lexemes would be
uselessly small and would guarantee that people *have to* set a custom
target on tsvector columns to get useful results. Obviously broken
defaults are not my bag.
> Also, the existing code decides which elements are worth storing as most
> common ones by discarding those that are not frequent enough (that's
> where num_mcv can get adjusted downwards). I mimicked that for lexemes
> but maybe it just doesn't make sense?
Well, that's not unreasonable either, if you can come up with a
reasonable definition of "not frequent enough"; but that adds another
variable to the discussion.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2008-07-10 23:12:48 | Re: gsoc, text search selectivity and dllist enhancments |
Previous Message | Tom Lane | 2008-07-10 22:09:13 | Re: Adding variables for segment_size, wal_segment_size and block sizes |