From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY |
Date: | 2024-06-11 11:00:00 |
Message-ID: | CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, hackers.
While testing my work on (1) I was struggling with addressing a strange
issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master
branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in
attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to
reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set
updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set
updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set
updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this
is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just
fails like this:
make -C src/test/modules/test_misc/ check
PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed :
84
# waiting for an about 3000, now is 2175, seconds passed :
84
# waiting for an about 3000, now is 2176, seconds passed :
84
# waiting for an about 3000, now is 2177, seconds passed :
84
# waiting for an about 3000, now is 2178, seconds passed :
84
# waiting for an about 3000, now is 2179, seconds passed :
84
# waiting for an about 3000, now is 2180, seconds passed :
84
# waiting for an about 3000, now is 2181, seconds passed :
84
# waiting for an about 3000, now is 2182, seconds passed :
84
# waiting for an about 3000, now is 2183, seconds passed :
84
# waiting for an about 3000, now is 2184, seconds passed :
84
# Failed test 'concurrent INSERTs, UPDATES and RC status
(got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr
/(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client
4 script 0 aborted in command 1 query 0: ERROR: duplicate key value
violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR:
duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in
command 1 query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0:
ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are
incomplete.
# '
Probably something wrong with arbiter index selection for different
backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar
situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n +
EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted
during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during
reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during
reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during
reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any
upsert. I think it may be somehow caused by two arbiter indexes (old and
new reindexed one).
Best regards,
Mikhail.
[1]:
https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]:
https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Attachment | Content-Type | Size |
---|---|---|
v1-0001-test-for-issue-with-upsert-fail.patch | application/x-patch | 12.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2024-06-11 11:22:18 | Re: Windows: openssl & gssapi dislike each other |
Previous Message | Fujii.Yuki@df.MitsubishiElectric.co.jp | 2024-06-11 10:40:14 | Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop() |