Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Date: 2024-06-11 11:00:00
Message-ID: CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers.

While testing my work on (1) I was struggling with addressing a strange
issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.

After some time, I have realized the same issue persists on the master
branch as well :)

I have prepared two TAP tests to reproduce the issues (2), also in
attachment.

First one, does the next thing:

CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to
reproduce but make it to happen faster

Then it runs next scripts with pgbench concurrently:

1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set
updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set
updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set
updated_at = now();

Also, during pgbench the next command is run in the loop:

REINDEX INDEX CONCURRENTLY tbl_pkey;

For some time, everything looks more-less fine (except live locks, but this
is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just
fails like this:

make -C src/test/modules/test_misc/ check
PROVE_TESTS='t/006_*'

# waiting for an about 3000, now is 2174, seconds passed :
84
# waiting for an about 3000, now is 2175, seconds passed :
84
# waiting for an about 3000, now is 2176, seconds passed :
84
# waiting for an about 3000, now is 2177, seconds passed :
84
# waiting for an about 3000, now is 2178, seconds passed :
84
# waiting for an about 3000, now is 2179, seconds passed :
84
# waiting for an about 3000, now is 2180, seconds passed :
84
# waiting for an about 3000, now is 2181, seconds passed :
84
# waiting for an about 3000, now is 2182, seconds passed :
84
# waiting for an about 3000, now is 2183, seconds passed :
84
# waiting for an about 3000, now is 2184, seconds passed :
84

# Failed test 'concurrent INSERTs, UPDATES and RC status
(got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.

# Failed test 'concurrent INSERTs, UPDATES and RC stderr
/(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client
4 script 0 aborted in command 1 query 0: ERROR: duplicate key value
violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR:
duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in
command 1 query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0:
ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1
query 0: ERROR: duplicate key value violates unique constraint
"tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are
incomplete.
# '

Probably something wrong with arbiter index selection for different
backends. I am afraid it could be a symptom of a more serious issue.

-------------------------------------

The second test shows an interesting live lock state in the similar
situation.

CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);

pgbench concurrently runs single command

INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n +
EXCLUDED.n;

And also reindexing in the loop

REINDEX INDEX CONCURRENTLY tbl_pkey;

After the start, a little bit strange issue happens

make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'

# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted
during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during
reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during
reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during
reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566

In some moments, all CPUs all hot but 30 connections are unable to do any
upsert. I think it may be somehow caused by two arbiter indexes (old and
new reindexed one).

Best regards,
Mikhail.

[1]:
https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]:
https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f

Attachment Content-Type Size
v1-0001-test-for-issue-with-upsert-fail.patch application/x-patch 12.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2024-06-11 11:22:18 Re: Windows: openssl & gssapi dislike each other
Previous Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2024-06-11 10:40:14 Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()