From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collect frequency statistics for arrays |
Date: | 2012-03-05 14:02:24 |
Message-ID: | CAPpHfdtwyn7QFy-1mo+fHHof=Bf8BTdbOO84a_aH9gGLXTmb0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, one other thing about the count histogram: seems like we are
> frequently generating uselessly large ones. For instance, do ANALYZE
> in the regression database and then run
>
> select tablename,attname,elem_count_histogram from pg_stats
> where elem_count_histogram is not null;
>
> You get lots of entries that look like this:
>
> pg_proc | proallargtypes |
> {1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,6,6,2.80556}
> pg_proc | proargmodes |
> {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.61111}
> pg_proc | proargnames |
> {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,7,7,7,7,8,8,8,14,14,15,16,3.8806}
> pg_proc | proconfig |
> {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
> pg_class | reloptions |
> {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
>
> which seems to me to be a rather useless expenditure of space.
> Couldn't we reduce the histogram size when there aren't many
> different counts?
>
> It seems fairly obvious to me that we could bound the histogram
> size with (max count - min count + 1), but maybe something even
> tighter would work; or maybe I'm missing something and this would
> sacrifice accuracy.
>
True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large, we can run out of
statistics_target with such representation. We can use same representation
of count distribution as for scalar column value: MCV and HISTOGRAM, but it
would require additional statkind and statistics slot. Probably, you've
better ideas?
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-03-05 14:34:37 | Re: autovacuum locks |
Previous Message | Pavel Stehule | 2012-03-05 13:49:01 | Re: review: CHECK FUNCTION statement |