Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Date: 2024-06-25 11:47:00
Message-ID: CANtu0ogJ4Xmz9kfmniE8SPuomOcN7Jwv54xF7Z_pQ3iHw2xC3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Michael!

> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.

From my point of view, INSERT ON CONFLICT UPDATE should never fail with
"ERROR: duplicate key value violates unique constraint" because the main
idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.

On the other side, REINDEX CONCURRENTLY should not cause any queries to
fail accidentally without any clear reason.

Also, as you can see from the topic starter letter, we could see errors
like this:

* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"

So, the first error message does not provide any clue for the developer to
understand what happened.

> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.

Yes, and it feels like we need one more flag here to distinguish
!indisvalid indexes which are going to become valid and which are going to
become !indislive.

For example, let name it as indiscorrect (it means it contains all the
data). In such case, we may use the following logic:

1) !indisvalid && !indiscorrect - index in validation phase probably, do
not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not
plan new queries with it, but it still may be used by other queries
(including upserts). So, we still need to include it to the arbiters.

And, during the reindex concurrently:

1) begin; mark new index as indisvalid and indiscorrect; mark old one as
!indisvalid but still indiscorrect. invalidate relcache; commit;

Currently, some queries are still using the old one as arbiter, some
queries use both.

2) WaitForLockersMultiple

Now all queries use both indexes as arbiter.

3) begin; mark old index as !indiscorrect, additionally to !indisvalid;
invalidate cache; commit;

Now, some queries use only the new index, both some still use both.

4) WaitForLockersMultiple;

Now, all queries use only the new index - we are safe to mark the old
one it as !indislive.

> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:

Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);

and it is always the valid check.

Best regards,
Mikhail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-06-25 12:03:15 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Previous Message Jelte Fennema-Nio 2024-06-25 11:45:06 Re: RFC: Additional Directory for Extensions