Quick Links

Re: tsvector pg_stats seems quite a bit off.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: tsvector pg_stats seems quite a bit off.
Date:	2010-05-29 16:38:31
Message-ID:	20653.1275151111@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
> [ e of ] s/2 or s/3 look reasonable.

The examples in the LC paper seem to all use e = s/10. Note the stated
assumption e << s.

> So, should I just write a patch that sets the bucket width and pruning
> count using 0.07 as the assumed frequency of the most common word and
> epsilon equal to s/2 or s/3?

I'd go with s = 0.07 / desired-MCE-count and e = s / 10, at least for
a first cut to experiment with.

regards, tom lane

In response to

Re: tsvector pg_stats seems quite a bit off. at 2010-05-29 16:14:36 from Jan Urbański

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2010-05-29 20:19:53	PG 9.0 release timetable
Previous Message	Jan Urbański	2010-05-29 16:14:36	Re: tsvector pg_stats seems quite a bit off.