From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum |
Date: | 2025-03-24 13:08:05 |
Message-ID: | 4yfgfeu3f6f7fayl5h3kggtd5bkvm4gj3a3ryjs2qhlo6g74bt@g3cu2u4pgaiw |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-24 13:41:17 +0300, Yura Sokolov wrote:
> 21.03.2025 19:33, Andres Freund wrote:
> > I'd also like to know a bit more about the motivation here - I can easily
> > believe that you hit contention around the shared inval queue, but I find it
> > somewhat hard to believe that a spinlock that's acquired *once* per batch
> > ("quantum"), covering a single read/write, is going to be the bottleneck,
> > rather than the much longer held LWLock, that protects iterating over all
> > procs.
> >
> > Have you verified that this actually addresses the performance issue?
>
> Problem on this spinlock were observed at least by two independent technical
> supports. First, some friendly vendor company shared the idea to remove it.
> We don't know exactly their situation. But I suppose it was quite similar
> to situation out tech support investigated at our client some months later:
>
> (Cite from tech support report:)
> > Almost 20% of CPU time is spend at backtraces like:
> 4b0d2d s_lock (/opt/pgpro/ent-15/bin/postgres)
> 49c847 SIGetDataEntries
> 49bf94 ReceiveSharedInvalidMessages
> 4a14ba LockRelationOid
> 1671f4 relation_open
> 1de1cd table_open
> 5e82aa RelationGetStatExtList
> 402a01 get_relation_statistics (inlined)
> 402a01 get_relation_info
> 407a9e build_simple_rel
> 3daa1d add_base_rels_to_query
> 3daa1d add_base_rels_to_query
> 3dd92b query_planner
>
>
> Client has many NUMA-nodes in single machine, and software actively
> generates invalidation messages (probably, due active usage of temporary
> tables).
>
> Since, backtrace is quite obvious and ends at s_lock, the patch have to help.
I don't believe we have the whole story here. It just doesn't seem plausible
that, with the current code, the spinlock in SIGetDataEntries() would be the
bottleneck, rather than contention on the lwlock. The spinlock just covers a
*single read*. Unless pgpro has modified the relevant code?
One possible explanation for why the spinlock shows up so badly is that it is
due to false sharing. Note that SiSeg->msgnumLock and the start of
SiSeg->buffer[] are on the same cache line.
How was this "Almost 20% of CPU time is spend at backtraces like" determined?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2025-03-24 13:24:17 | Re: Improve monitoring of shared memory allocations |
Previous Message | vignesh C | 2025-03-24 12:56:44 | Re: [18] CREATE SUBSCRIPTION ... SERVER |