From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR |
Date: | 2022-02-08 21:13:01 |
Message-ID: | df7b4c0b-7d92-f03f-75c4-9e08b269a716@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On 10/24/21 03:40, Noah Misch wrote:
> Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
>
> CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
> no later than each backend's next transaction start. That failed to
> hold when a backend absorbed a relevant invalidation in the middle of
> running RelationBuildDesc() on the CIC index. Queries that use the
> resulting index can silently fail to find rows. Fix this for future
> index builds by making RelationBuildDesc() loop until it finishes
> without accepting a relevant invalidation. It may be necessary to
> reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
> Back-patch to 9.6 (all supported versions).
>
> Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
> Freund.
>
> Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
>
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds.
Since this commit, initdb never completes due to infinite retrying over
and over (on the first RelationBuildDesc call).
We have a CLOBBER_CACHE_ALWAYS buildfarm machine "avocet", and that
currently looks like this (top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2626 buildfa+ 20 0 202888 21416 20084 R 98.34 0.531 151507:16
/home/buildfarm/avocet/buildroot/REL9_6_STABLE/pgsql.build/tmp_install/home/buildfarm/avocet/buildroot/REL9_6_STABLE/inst/bin/postgres
--boot -x1 -F
Yep, that's 151507 minutes, i.e. 104 days in initdb :-/
I haven't looked at this very closely yet, but it seems the whole
problem is we do this at the very beginning:
in_progress_list[in_progress_offset].invalidated = false;
/*
* find the tuple in pg_class corresponding to the given relation id
*/
pg_class_tuple = ScanPgRelation(targetRelId, true, false);
which seems entirely self-defeating, because ScanPgRelation acquires a
lock (on pg_class), which accepts invalidations, which invalidates
system caches (in clobber_cache_always), which sets promptly sets
in_progress_list[in_progress_offset].invalidated = false;
guaranteeing an infinite loop.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | noreply | 2022-02-08 22:23:39 | pgsql: Tag refs/tags/REL_13_6 was created |
Previous Message | Robert Haas | 2022-02-08 21:12:19 | pgsql: Remove MaxBackends variable in favor of GetMaxBackends() functio |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhihong Yu | 2022-02-08 21:45:42 | Re: Fix BUG #17335: Duplicate result rows in Gather node |
Previous Message | Robert Haas | 2022-02-08 21:12:26 | Re: make MaxBackends available in _PG_init |