From: | Greg Smith <greg(at)2ndQuadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: procpid? |
Date: | 2011-06-15 07:34:34 |
Message-ID: | 4DF8608A.70708@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here's the sort of thing every person who writes a monitoring tool
involving pg_stat_activity goes through:
1) Hurray! I know how to see what the database is doing now! Let me
try counting all the connections so I can finally figure out what to set
[max_connections | work_mem | other] to.
2) Wait, some of these can be "<IDLE>". That's not documented. I'll
have to special case them because they don't really matter for my
computation.
3) Seriously, there's another state for idle in a transaction? Just how
many of these special values are there? [There's actually one more
surprise after this]
The whole thing is enormously frustrating, and it's an advocacy
problem--it contributes to people just starting to become serious about
using PostgreSQL lowering their opinion of its suitability for their
business. If this is what's included for activity monitoring, and it's
this terrible, it suggest people must not have very high requirements
for that.
And what you end up with to make it better is not just another few
keystrokes. Here, as a common example I re-use a lot, is a decoder
inspired by Munin's connection count monitoring graph:
SELECT
waiting,
CASE WHEN current_query='<IDLE>' THEN true ELSE false END AS idle,
CASE WHEN current_query='<IDLE> in transaction' THEN true ELSE
false END AS idletransaction,
CASE WHEN current_query='<insufficient privilege>' THEN false ELSE
true END as visible,
CASE WHEN NOT waiting AND current_query NOT IN ('<IDLE>', '<IDLE>
in transaction', '<insufficient privilege>') THEN true ELSE false END AS
active,
procpid,current_query
FROM pg_stat_activity WHERE procpid != pg_backend_pid();
What percentage of people do you think get this right? Now, what does
that number go to if these states were all obviously exposed booleans?
As I'm concerned, this design is fundamentally flawed as currently
delivered, so the concept of "breaking" it doesn't really make sense.
The fact that you can only figure all this decoding magic out through
extensive trial and error, or reading the source code to [the database |
another monitoring tool], is crazy. It's a much bigger problem than the
fact that the pid column is misnamed, and way up on my list of things
I'm just really tired of doing. Yes, we could just document all these
mystery states to help, but they'd still be terrible.
This is a database; let's expose the data in a way that it's easy to
slice yourself using a database query. And if we're going to fix
that--which unfortunately will be breaking it relative to those already
using the current format--I figure why not bundle the procpid fix into
that while we're at it. It's even possible to argue that breaking that
small thing will draw useful attention to the improvements in other
parts of the view. Having your monitoring query break after a version
upgrade is no fun. But if investigating why reveals new stuff you
didn't notice in the release notes, the changes become more
discoverable, albeit in a somewhat perverse way.
Putting on my stability hat instead of my "make it right" one, maybe
this really makes sense to expose as a view with a whole new name. Make
this new one pg_activity (there's no stats here anyway), keep the old
one around as pg_stat_activity for a few releases until everyone has
converted to the new one.
--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2011-06-15 08:03:59 | Re: WIP: Fast GiST index build |
Previous Message | Dean Rasheed | 2011-06-15 07:33:57 | Re: creating CHECK constraints as NOT VALID |