Re: tsvector pg_stats seems quite a bit off.

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector pg_stats seems quite a bit off.
Date: 2010-05-25 18:44:29
Message-ID: 1274812862-sup-7954@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Excerpts from Jesper Krogh's message of mié may 19 15:01:18 -0400 2010:

> But the distribution is very "flat" at the end, the last 128 values are
> excactly
> 1.00189e-05
> which means that any term sitting outside the array would get an estimate of
> 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows

I don't know if this is related, but tsvector stats are computed and
stored per term, not per datum. This is different from all other
datatypes. Maybe there's code somewhere that's assuming per-datum and
coming up with the wrong estimates? Or maybe the tsvector-specific code
contains a bug somewhere; maybe a rounding error?

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-05-25 18:47:28 Re: Exposing the Xact commit order to the user
Previous Message Florian Pflug 2010-05-25 18:35:44 Re: Exposing the Xact commit order to the user