Re: Observability in Postgres

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, David Fetter <david(at)fetter(dot)org>, stark(at)aiven(dot)io
Subject: Re: Observability in Postgres
Date: 2022-02-15 21:43:10
Message-ID: CABUevExMJuUE-GPOa0V8rMS-7gpwP36eESWqas1RV2zitq-4JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2022 at 1:30 PM Dave Page <dpage(at)pgadmin(dot)org> wrote:
>
> Hi Greg,
>
> On Mon, 14 Feb 2022 at 20:16, Greg Stark <stark(at)mit(dot)edu> wrote:
>>
>> So I've been dealing a lot with building and maintaining dashboards
>> for (fleets of) Postgres servers. And it's a pain. I have a few
>> strongly held ideas about where the pain points are and what the right
>> ways to tackle them are. Some of which are going to be controversial I
>> think...
>>
>> The state of the art is pg_exporter which is a separate client that
>> connects to the database and runs SQL queries to extract database
>> metrics. The pain points I've seen are:
>>
>> 1) The exporter isn't reliable when things are failing. If your
>> clients can't connect the exporter also can't connect leading to data
>> gaps in your metrics for precisely the time windows where you need
>> data. This can happen to connection exhaustion, xid wraparound, or
>> even something as simple as someone taking an exclusive lock on
>> something used in the sql queries.
>>
>> 2) SQL connections are tied to specific databases within a cluster.
>> Making it hard to get data for all your databases if you have more
>> than one. The exporter needs to reconnect to each database.
>>
>> 3) The exporter needs to listen on a different port from the
>> postmaster. Making it necessary to write software to manage the
>> mapping from server port to exporter port and that's left to the
>> end-user as it varies from site to site.
>>
>> 4) The queries are customizable (the built-in ones don't exhaustively
>> exporter postgres's metrics). As a result there's no standard
>> dashboard that will work on any site out of the box. Moreover issue
>> (3) also makes it impossible to implement one that works properly.
>>
>> 5) data needs to be marshaled from shared memory into SQL and then
>> read by the client and re-emitted in the metric format. The double
>> processing requires writing SQL queries very carefully to avoid losing
>> fidelity for things like LSN positions, xids, etc. Moreover the
>> latency and gathering data from multiple SQL queries results in
>> metrics that are often out of sync with each other making them hard to
>> interpret.
>>
>> All this said, I think we should have a component in Postgres that
>> reads from the stats data directly and outputs metrics in standard
>> metrics format directly. This would probably take the form of a
>> background worker with a few tricky bits.
>>
>> This would mean there would be a standard official set of metrics
>> available that a standard dashboard could rely on to be present at any
>> site and it would be reliable if the SQL layer isn't functioning due
>> to lack of connections or xid wraparound or locking issues.
>>
>> The elephant in the room is that issue (3) requires a bit of sleight
>> of hand. Ideally I would want it to be listening on the same ports as
>> the database. That means having the postmaster recognize metric
>> requests and hand them to the metrics background worker instead of a
>> backend. I'm not sure people are going to be ok with that....
>>
>> For now my approach is to implement a background worker that listens
>> on a new port and is basically its own small web server with shared
>> memory access. This ignores issue (3) and my hope is that when we have
>> some experience with this approach we'll see how reliable it is and
>> how comfortable we are with the kind of hacking in postmaster it would
>> take to fix it. Fwiw I do think this is an important issue and not one
>> that we can ignore indefinitely.
>>
>> There is another elephant in the room (it's a big room) which is that
>> this all makes sense for stats data. It doesn't make much sense for
>> data that currently lives in pg_class, pg_index, etc. In other words
>> I'm mostly solving (2) by ignoring it and concentrating on stats data.
>>
>> I haven't settled on a good solution for that data. I vaguely lean
>> towards saying that the volatile metrics in those tables should really
>> live in stats or at least be mirrored there. That makes a clean
>> definition of what Postgres thinks a metric is and what it thinks
>> catalog data is. But I'm not sure that will really work in practice.
>> In particular I think it's likely we'll need to get catalog data from
>> every database anyways, for example to label things like tables with
>> better labels than oids.
>>
>> This work is being funded by Aiven which is really interested in
>> improving observability and integration between Postgres and other
>> open source cloud software.
>
>
> I agree with pretty much everything above, bar a couple of points:
>
> - Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that is a good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the monitoring traffic to a management VLAN for example.

+1. I think it would be much better to keep it on a separate port.

Doesn't even have to be to the point of VLANs or whatever. You just
want your firewall rules to be able to know what data it's talking
about.

Another part missing in the proposal is how to deal with
authentication. That'll be an even harder problem if it sits on the
same port but speaks a different protocol. How would it work with
pg_hba etc?

> - I strongly dislike the idea of building this around the prometheus exporter format. Whilst that is certainly a useful format if you're using prom (as many do), it does have limitations and quirks that would make it painful for other systems to use; for example, the need to encode non-numeric data into labels rather than the metrics themselves (e.g. server version strings or LSNs). I would much prefer to see a common format such as JSON used by default, and perhaps offer a hook to allow alternate formatters to replace that. The prometheus format is also pretty inefficient, as you have to repeat all the key data (labels) for each individual metric.

There's good and bad with it. The bug "good" with it is that it's an
open standard (openmetrics). I think supporting that would be a very
good idea. But it would also be good to have a different, "richer",
format available. Whether it'd be worth to go the full "postgresql
way" and make it pluggable is questionable, but I would suggest at
least having both openmetrics and a native/richer one, and not just
the latter. Being able to just point your existing monitoring system
at a postgres instance (with auth configured) and have things just
shows up is in itself a large value. (Then either pluggable or hooks
beyond that, but having both those as native)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-15 21:55:12 Re: fixing bookindex.html bloat
Previous Message Andres Freund 2022-02-15 21:42:55 Re: adding 'zstd' as a compression algorithm