From: | Jim Nasby <jim(dot)nasby(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "andrew(at)dunslane(dot)net" <andrew(at)dunslane(dot)net> |
Subject: | Re: Random pg_upgrade test failure on drongo |
Date: | 2024-01-08 16:06:40 |
Message-ID: | 9897f89d-3d77-40fe-b05f-ac7b492e8160@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1/4/24 10:19 PM, Amit Kapila wrote:
> On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>>
>> 03.01.2024 14:42, Amit Kapila wrote:
>>>
>>
>>>> And the internal process is ... background writer (BgBufferSync()).
>>>>
>>>> So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
>>>> got 20 x 10 tests passing.
>>>>
>>>> Thus, it we want just to get rid of the test failure, maybe it's enough to
>>>> add this to the test's config...
>>>>
>>> What about checkpoints? Can't it do the same while writing the buffers?
>>
>> As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
>> the desired effect, but I think it's not impossible in principle.
>> More details below.
>> What happens during the pg_upgrade execution is essentially:
>> 1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
>> -- this command flushes file buffers as well
>> 2) ALTER DATABASE postgres OWNER TO ...
>> 3) COMMENT ON DATABASE "postgres" IS ...
>> 4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
>> SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
>> SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
>> TRUNCATE pg_catalog.pg_largeobject;
>> -- ^^^ here we can get the error "could not create file "base/5/2683": File exists"
>> ...
>>
>> We get the effect discussed when the background writer process decides to
>> flush a file buffer for pg_largeobject during stage 1.
>> (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
>> the result must be the same.)
>> And another important factor is shared_buffers = 1MB (set during the test).
>> With the default setting of 128MB I couldn't see the failure.
>>
>> It can be reproduced easily (on old Windows versions) just by running
>> pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
>> default cluster)).
>> If an old cluster contains dozen of databases, this increases the failure
>> probability significantly (with 10 additional databases I've got failures
>> on iterations 4, 1, 6).
>>
>
> I don't have an old Windows environment to test but I agree with your
> analysis and theory. The question is what should we do for these new
> random BF failures? I think we should set bgwriter_lru_maxpages to 0
> and checkpoint_timeout to 1hr for these new tests. Doing some invasive
> fix as part of this doesn't sound reasonable because this is an
> existing problem and there seems to be another patch by Thomas that
> probably deals with the root cause of the existing problem [1] as
> pointed out by you.
>
> [1] - https://commitfest.postgresql.org/40/3951/
Isn't this just sweeping the problem (non-POSIX behavior on SMB and
ReFS) under the carpet? I realize that synthetic test workloads like
pg_upgrade in a loop aren't themselves real-world scenarios, but what
about other cases? Even if we're certain it's not possible for these
issues to wedge a server, it's still not a good experience for users to
get random, unexplained IO-related errors...
--
Jim Nasby, Data Architect, Austin TX
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2024-01-08 16:10:20 | Re: pg_stat_statements and "IN" conditions |
Previous Message | Alvaro Herrera | 2024-01-08 15:51:22 | Re: brininsert optimization opportunity |