From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Any risk or overhead considerations for frequently executing queries against catalog tables? |
Date: | 2025-01-26 06:43:24 |
Message-ID: | CAFj8pRCqk45-DM_KGfHR8RowjW5WByjAGbixupjnnJa9n1cT_A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
so 25. 1. 2025 v 21:01 odesílatel Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
napsal:
> I am looking at whether sampling key database catalog information per
> second would have any drawback whatsoever.
> I think you're saying that you think isn't the case, except maybe for
> pg_database, and I figure that is because of the frozen and multi xact
> fields per database.
>
> If the database client application is too unpredictable to know what SQL
> it will produce, then having runtime data available at that granularity, so
> it can be reasonably constructed what is going on is very convenient and
> allows tremendous insight. It would also allow usage of the waitevents to
> spot any weird behavior, such as short-lived peaks. (pg_stat_statements can
> do that on a busy database, for example).
> And if there is no known drawback, if such a low interval can be
> organized: why not? I am not saying you are doing it wrong, this is about
> trying to figure out what are the borders of what would be technically
> possible without unreasonably affecting the database, a thought experiment.
>
> If course the gathered data needs to be organized so that you don't swamp
> in it, and it shouldn't lead to the monitoring data swamping the system,
> either in memory or on disk, but that is a given.
>
> Why would per second be too much for locks? Is there overhead to select
> from pg_locks, or pg_blocking_pids()?
>
when you have a query about 10ms, then the lock 50ms is important overhead.
Surely queries to pg_locks are queries like any other - there is a lot of
overhead with planner, executor, and locking. Fortunately, it is almost CPU
related.
>
> Again, please realise I am happy and appreciative of the time you take, I
> am toying with the above described idea.
>
> *Frits Hoogland*
>
>
>
>
> On 25 Jan 2025, at 19:18, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
> Hi
>
> so 25. 1. 2025 v 18:00 odesílatel Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
> napsal:
>
>> Thank you Pavel, that is really useful. I can imagine other people
>> thinking about getting fine grained data from postgres might wonder the
>> same as I do about this.
>> And really from a computer's perspective I would say that once a second
>> isn't really a high frequency?
>>
>
> I usually work with minute sampling and usually it is good enough
> (statistics are cumulative, so you can lose the timestamp, but you never
> lose data.
>
> Only when we try to investigate some special case, then I use second
> sampling. When you investigate lock issues, then seconds are too much
>
> Regards
>
> Pavel
>
>
>> If I time the amount of time that these queries take, it's around 20ms
>> (local connection), so there is a relative long time of all the objects
>> including pg_database are not actively queried.
>>
>> I git grepped the sourcecode, it seems that there is a rowexclusive lock
>> for pg_database manipulation in case of addition, removal and change of a
>> database in dbcommands.c, but I do think your reasoning is based on the
>> columns datfrozenxid and datminmxid?
>>
>> There is a lock for updating the frozenxid and mxid for a database in
>> (vacuum.c:LockDatabaseFrozenIds, ExclusiveLock), but it seems a select
>> should play nice with that?
>>
>> btw, it's interesting to see that both datfrozenxid and datminmxid are in
>> place updated, with no read consistency provided.
>>
>> *Frits Hoogland*
>>
>>
>>
>>
>> On 25 Jan 2025, at 14:32, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>> Hi
>>
>> so 25. 1. 2025 v 12:23 odesílatel Frits Hoogland <
>> frits(dot)hoogland(at)gmail(dot)com> napsal:
>>
>>> For monitoring database behaviour and trying to build an history of
>>> activity, if I would create an application that creates a single connection
>>> and execute something like:
>>> select * from pg_stat_activity;
>>> select * from pg_stat_database;
>>> select * from pg_stat_bgwriter;
>>> select * from pg_stat_wal;
>>> select * from pg_settings;
>>> select * from pg_database;
>>> For which the query is prepared, and execute that every 1 second, would
>>> there be any realistic danger or overhead that should be considered?
>>> My thinking is that the data for these catalogs are all in shared memory
>>> and when executed serially and do not cause any significant resources to be
>>> taken?
>>>
>>
>> The queries to all tables excluding pg_database every 1 sec will have
>> probably zero impact to performance.
>>
>> I am not sure about pg_database - it is a very important table, and your
>> query can block operations that need exclusive lock to this table. So
>> theoretically, there can be some impact to performance.
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Thanks,
>>>
>>> *Frits Hoogland*
>>>
>>>
>>>
>>>
>>>
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Praveen Kumar Mandula | 2025-01-26 07:14:57 | Reg pg_restore taking more time in windows compare to linux |
Previous Message | frits.hoogland | 2025-01-25 22:42:02 | Re: Any risk or overhead considerations for frequently executing queries against catalog tables? |