Re: shared-memory based stats collector - v70

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: shared-memory based stats collector - v70
Date: 2022-08-10 22:23:27
Message-ID: 20220810222327.5tjgq7hmiipeeuo2@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-10 14:18:25 -0400, Greg Stark wrote:
> > I don't think that's a large enough issue to worry about unless you're
> > polling at a very high rate, which'd be a bad idea in itself. If a backend
> > can't get the lock for some stats change it'll defer flushing the stats a bit,
> > so it'll not cause a lot of other problems.
>
> Hm. I wonder if we're on the same page about what constitutes a "high rate".
>
> I've seen people try push prometheus or other similar systems to 5s
> poll intervals. That would be challenging for Postgres due to the
> volume of statistics. The default is 30s and people often struggle to
> even have that function for large fleets. But if you had a small
> fleet, perhaps an iot style system with a "one large table" type of
> schema you might well want stats every 5s or even every 1s.

That's probably fine. Although I think you might run into trouble not from the
stats subystem side, but from the "amount of data" side. On a system with a
lot of objects that can be a fair amount. If you really want to do very low
latency stats reporting, I suspect you'd have to build an incremental system.

> > I'm *dead* set against including catalog names in shared memory stats. That'll
> > add a good amount of memory usage and complexity, without any sort of
> > comensurate gain.
>
> Well it's pushing the complexity there from elsewhere. If the labels
> aren't in the stats structures then the exporter needs to connect to
> each database, gather all the names into some local cache and then it
> needs to worry about keeping it up to date. And if there are any
> database problems such as disk errors or catalog objects being locked
> then your monitoring breaks though perhaps it can be limited to just
> missing some object names or having out of date names.

Shrug. If the stats system state desynchronizes from an alter table rename
you'll also have a problem in monitoring.

And even if you can benefit from having all that information, it'd still be an
overhead born by everybody for a very small share of users.

> > > I also think it would be nice to have a change counter for every stat
> > > object, or perhaps a change time. Prometheus wouldn't be able to make
> > > use of it but other monitoring software might be able to receive only
> > > metrics that have changed since the last update which would really
> > > help on databases with large numbers of mostly static objects.
> >
> > I think you're proposing adding overhead that doesn't even have a real user.
>
> I guess I'm just brainstorming here. I don't need to currently no. It
> doesn't seem like significant overhead though compared to the locking
> and copying though?

Yes, timestamps aren't cheap to determine (nor free too store, but that's a
lesser issue).

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-10 22:26:25 Re: shared-memory based stats collector - v70
Previous Message Nathan Bossart 2022-08-10 21:45:04 Re: optimize lookups in snapshot [sub]xip arrays