Re: Stats target increase vs compute_tsvector_stats()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Stats target increase vs compute_tsvector_stats()
Date: 2008-12-15 15:01:48
Message-ID: 29737.1229353308@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> writes:
> Tom Lane wrote:
>> I came across this bit in ts_typanalyze.c:
>>
>> /* We want statistic_target * 100 lexemes in the MCELEM array */
>> num_mcelem = stats->attr->attstattarget * 100;
>>
>> I wonder whether the multiplier here should be changed?

> The origin of that bit is this post:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00556.php
> and the following few downthread ones.

> If we bump the default statistics target 10 times, then changing the
> multiplier to 10 seems the right thing to do.

OK, will do.

> Only thing that needs
> caution is the frequency of pruning we do in the Lossy Counting
> algorithm, that IIRC is correlated with the desired target length of the
> MCELEM array.

Right below that we have

/*
* We set bucket width equal to the target number of result lexemes.
* This is probably about right but perhaps might need to be scaled
* up or down a bit?
*/
bucket_width = num_mcelem;

so it should track automatically. AFAICS the argument in the above
thread that this is an appropriate pruning distance holds good
regardless of just how we obtain the target mcelem count.

> BTW: I've been occupied with other things and might have missed some
> discussions, but at some point it has been considered to use Lossy
> Counting to gather statistics from regular columns, not only tsvectors.
> Wouldn't this help the performance hit ANALYZE takes from upping
> default_stats_target?

Perhaps, but it's not likely to get done for 8.4 ...

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-15 15:12:07 Re: rules regression test failed on mingw
Previous Message Jonah H. Harris 2008-12-15 14:57:55 Re: Block-level CRC checks