Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Date: 2024-08-24 15:52:00
Message-ID: CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, everyone!

This patch set addresses the issues discussed in this thread.

The main idea behind this fix is that it is safe to consider indisready
indexes alongside indisvalid indexes as arbiter indexes. However, it's
crucial that at least one fully valid index is present.

Why is it necessary to consider indisready during the planning phase?

The reason is that these indexes are required for correct processing during
the execution phase.
If "ready" indexes are skipped as arbiters by one transaction, they may
already have become "valid" for another concurrent transaction during its
planning phase.
As a result, both transactions could concurrently process the UPSERT
command with different sets of arbiters (while using the same set of
indexes for tuple insertion later).
This can lead to unexpected "duplicate key value violates unique
constraint" errors and deadlocks.

Is it safe to use a "ready" but not yet "valid" index as an arbiter?
Yes, as long as at least one "valid" index is also used as an arbiter.
The valid index ensures the correctness of the UPSERT logic, while the
"ready" index contains an equal or lesser number of tuples, making it safe
for speculative insertion.
In any case, the insert to that index will be processed during
ExecInsertIndexTuples one way or another (with applyNoDupErr or without).

Fix is divided into a few patches, each following this logic:

1) The first patch provides specs (and injection points) for the various
scenarios related to the issue.
2) The second patch introduces a straightforward change—adding indisready
indexes to arbiters alongside indisvalid. However, at least one indisvalid
is still required. This resolves simple cases involving REINDEX
CONCURRENTLY and CREATE INDEX CONCURRENTLY.
3) The third patch deals with named constraints. Instead of relying solely
on the index with the specified name, we attempt to find other indexes that
are equivalent in terms of being used as an arbiter.
4) This patch fixes a scenario involving partitioned tables. Special checks
are required for partitioned indexes, which may be processed by REINDEX
CONCURRENTLY.

Additionally, a patch with three extra TAP specifications for stress
testing is attached. This patch is not intended for commitment, so I
renamed the extension to prevent accidental application in some CI/DI jobs.

>
Also, it is possible to look at the patches on GitHub:
https://github.com/postgres/postgres/compare/master...michail-nikolaev:postgres:reindex_concurrently_with_upsert

Best regards,
Mikhail.

Attachment Content-Type Size
v3-0002-Modify-the-infer_arbiter_indexes-function-to-cons.patch text/x-patch 2.9 KB
v3-0001-Specs-top-reproduce-the-issues-with-CREATE-INDEX-.patch text/x-patch 48.0 KB
v3-0003-Modify-the-infer_arbiter_indexes-function-to-also.patch text/x-patch 8.1 KB
v3-0004-Modify-the-ExecInitPartitionInfo-function-to-cons.patch text/x-patch 6.1 KB
stress_test.__patch__ application/octet-stream 18.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-08-24 17:17:47 Re: Interrupts vs signals
Previous Message Peter Eisentraut 2024-08-24 14:27:23 list of acknowledgments for PG17