Re: BUG #7853: Incorrect statistics in table with many dead rows.

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: James R Skaggs <james(dot)r(dot)skaggs(at)seagate(dot)com>
Cc: "jimbob(at)seagate(dot)com" <jimbob(at)seagate(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7853: Incorrect statistics in table with many dead rows.
Date: 2013-02-05 23:12:15
Message-ID: 1360105935.25882.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

James R Skaggs <james(dot)r(dot)skaggs(at)seagate(dot)com> wrote:

> Agreed, we shouldn't have so many dead rows.  Our autovacuum is
> set on but has default parameters.   We are clustering today.
> This, of course, means downtime and inconvenience to the users.

Right, which is why it's important to figure out why the bloat
happened.  Sometimes it is unavoidable, like when you delete 90% of
the rows in your table or a long-lived "idle in transaction"
connection prevents autovacuum from being able to do its work
normally.  To prevent further downtime it is important to figure
out what happened and make appropriate changes to your monitoring
or vacuuming.

> Here is the troublesome query:
>
>> select
>>    sum(case when t1.cdlc < 0 then 0 else t1.cdlc end) as code_line_cnt,
>>    sum(case when t1.cmlc < 0 then 0 else t1.cmlc end) as comment_line_cnt,
>>    sum(case when t1.bllc < 0 then 0 else t1.bllc end) as blank_line_cnt
>>  from
>>    ( select
>>        max(sf.current_code_line_count) as cdlc,
>>        max(sf.current_comment_line_count) as cmlc,
>>        max(sf.current_blank_line_count) as bllc
>>      from
>>        stream_file sf
>>      group by sf.file_path_id, sf.current_source_md5
>>    ) as t1;

Well, I don't see that the planner has a lot of choice there
besides whether to use a sort or a hash to do the inner
aggregation.  Are you saying that prior to the bloat it used a hash
aggregation, and that was faster?  And that you feel that it should
be using that even with the bloat?  That the dead rows seem to be
getting included in the statistics, driving to the slower plan, and
you feel they should be omitted?

Note that I'm not aruing one way or another on these points at the
moment; I'm just trying to understand your point clearly.

-Kevin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message James R Skaggs 2013-02-05 23:35:30 Re: BUG #7853: Incorrect statistics in table with many dead rows.
Previous Message James R Skaggs 2013-02-05 22:43:35 Re: BUG #7853: Incorrect statistics in table with many dead rows.