Re: Issue enabling track_counts to launch autovacuum in 9.4.5

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Derek Elder <dereke(at)mirthcorp(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Issue enabling track_counts to launch autovacuum in 9.4.5
Date: 2016-03-02 23:06:10
Message-ID: CAKFQuwawEKJy5dQ8PdnFG+mpgQmb_edkWbV8qw31UsjjjDgERw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Mar 2, 2016 at 3:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Derek Elder <dereke(at)mirthcorp(dot)com> writes:
> > That was indeed the root cause. The /etc/hosts file on the server had
> > incorrect permissions which caused localhost to not resolve.
>
> It strikes me that this should not have been so hard to solve. The
> stats collector was trying to tell you what was wrong, but evidently
> you could not interpret those messages correctly. I am thinking that
> we need to do some work on the message wording; or maybe there is one
> more message that needs to be emitted so you can follow the causal
> chain?
>
> In particular, perhaps it wasn't immediately obvious that the first
> of these messages was the cause of the second:
>
> > 2016-03-02 14:58:09 EST [14366]: [8-1] LOG: could not resolve
> "localhost": Name or service not known
> > 2016-03-02 14:58:09 EST [14366]: [9-1] LOG: disabling statistics
> collector for lack of working socket
>
> in which case maybe we could rephrase the first message along the
> lines of "could not resolve "localhost" to establish statistics
> collector socket: <strerror detail here>". (There are a few other
> messages in the same area that would need to be changed similarly.)
>
> Or maybe the problem was that when we forced track_counts off because of
> no stats collector, we didn't emit any bleat noting that, which if we had
> might have led you to realize that the above messages were the direct
> cause of the next one:
>
> > 2016-03-02 14:58:09 EST [14366]: [10-1] WARNING: autovacuum not started
> because of misconfiguration
> > 2016-03-02 14:58:09 EST [14366]: [11-1] HINT: Enable the "track_counts"
> option.
>
> Or both changes, or something else entirely?
>
> I'd be interested to hear how you perceived these log messages and
> what you think might help the next person.
>

​The fact that the first two are only LOG level and not WARNING would seems
like the easiest improvement to make. I had the benefit of basically
knowing track_counts was a red-herring given the provided context so I went
and started looking at anything preceding the first warning that could give
me a hint as to the nature of the "misconfiguration".

It probably would help to specify, if known, whether the suspected
mis-configuration is external or internal to PostgreSQL - i.e., do I need
to fix postgres.conf or is something external (like the hosts file) to
blame. In this case since we don't control "localhost" it would be
"external misconfiguration".

This also doesn't help:
show autovacuum;
autovacuum
------------
on

Why do we indirectly disable autovacuum via disabling one of its required
parameters instead of just disabling the main property. I don't suppose we
can add a third option (on, off, broken) to this which would allow
distinguishing between a user-specified condition (off) and a system
imposed one (broken).

This is getting a bit deep for a rare problem like this - I think that
making ​the root messages WARNING (or ERROR) instead of info (and ideally
linking the two explicitly if possible) would have the desired effect of
pointing the user to the first thing they need to fix - and assume they
would ignore all subsequent messages (and hints) until the first one is
handled (i.e. use good trouble-shooting practices). The hint and the
change to track_counts then becomes a non-issue.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message david 2016-03-02 23:06:57 CStringGetTextDatum and other conversions in server-side code
Previous Message avi Singh 2016-03-02 23:03:06 Postgresql upgrade 9.5