From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: Log inability to lock pages during vacuum |
Date: | 2014-10-20 15:29:58 |
Message-ID: | CAM-w4HNpoj_qfPY+7juVrcFhR=Gbk3tpFcPc_5q8R-tdmbsinQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
> lock on, with no retry. Presumably this should be a rare occurrence, but I
> think it's bad that we just assume that and won't warn the user if something
> bad is going on.
>
> My thought is that if we skip any pages elog(LOG) how many we skipped. If we
> skip more than 1% of the pages we visited (not relpages) then elog(WARNING)
> instead.
Is there some specific failure you've run into where a page was stuck
in a pinned state and never got vacuumed?
I would like to see a more systematic way of going about this. What
LSN or timestamp is associated with the oldest unvacuumed page? How
many times have we tried to visit it? What do those numbers look like
overall -- i.e. what's the median number of times it takes to vacuum a
page and what does the distribution look like of the unvacuumed ages?
With that data it should be possible to determine if the behaviour is
actually working well and where to draw the line to determine outliers
that might represent bugs.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Brightwell, Adam | 2014-10-20 15:30:42 | Re: alter user/role CURRENT_USER |
Previous Message | Noah Misch | 2014-10-20 15:24:26 | Re: narwhal and PGDLLIMPORT |