From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Slow standby snapshot |
Date: | 2021-06-13 17:12:13 |
Message-ID: | CANtu0oh_ytfAgRYOSfQP49eFZv7qRFH+zdDB9=Bz0e7DQj5VUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
)Hello.
> I recently ran into a problem in one of our production postgresql cluster.
> I had noticed lock contention on procarray lock on standby, which causes WAL
> replay lag growth.
Yes, I saw the same issue on my production cluster.
> 1) set max_connections to big number, like 100000
I made the tests with a more realistic value - 5000. It is valid value
for Amazon RDS for example (default is
LEAST({DBInstanceClassMemory/9531392}, 5000)).
The test looks like this:
pgbench -i -s 10 -U postgres -d postgres
pgbench -b select-only -p 6543 -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres
pgbench -b simple-update -j 1 -c 50 -n -P 1 -T 18000 -U postgres postgres
long transaction on primary - begin;select txid_current();
perf top -p <pid of some standby>
So, on postgres 14 (master) non-patched version looks like this:
5.13% postgres [.] KnownAssignedXidsGetAndSetXmin
4.61% postgres [.] pg_checksum_block
2.54% postgres [.] AllocSetAlloc
2.44% postgres [.] base_yyparse
It is too much to spend 5-6% of CPU running throw an array :) I think
it should be fixed for both the 13 and 14 versions.
The patched version like this (was unable to notice
KnownAssignedXidsGetAndSetXmin):
3.08% postgres [.] pg_checksum_block
2.89% postgres [.] AllocSetAlloc
2.66% postgres [.] base_yyparse
2.00% postgres [.] MemoryContextAllocZeroAligned
On postgres 13 non patched version looks even worse (definitely need
to be fixed in my opinion):
26.44% postgres [.] KnownAssignedXidsGetAndSetXmin
2.17% postgres [.] base_yyparse
2.01% postgres [.] AllocSetAlloc
1.55% postgres [.] MemoryContextAllocZeroAligned
But your patch does not apply to REL_13_STABLE. Could you please
provide two versions?
Also, there are warnings while building with patch:
procarray.c:4595:9: warning: ISO C90 forbids mixed
declarations and code [-Wdeclaration-after-statement]
4595 | int prv = -1;
| ^~~
procarray.c: In function ‘KnownAssignedXidsGetOldestXmin’:
procarray.c:5056:5: warning: variable ‘tail’ set but not used
[-Wunused-but-set-variable]
5056 | tail;
| ^~~~
procarray.c:5067:38: warning: ‘i’ is used uninitialized in
this function [-Wuninitialized]
5067 | i = KnownAssignedXidsValidDLL[i].nxt;
Some of them are clear errors, so, please recheck the code.
Also, maybe it is better to reduce the invasivity by using a more
simple approach. For example, use the first bit to mark xid as valid
and the last 7 bit (128 values) as an optimistic offset to the next
valid xid (jump by 127 steps in the worse scenario).
What do you think?
Also, it is a good idea to register the patch in the commitfest app
(https://commitfest.postgresql.org/)
Thanks,
Michail.
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2021-06-13 18:46:36 | Re: unnesting multirange data types |
Previous Message | Mikael Kjellström | 2021-06-13 17:05:10 | Re: Race condition in recovery? |