From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Jim Nasby <jim(dot)nasby(at)gmail(dot)com> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "andrew(at)dunslane(dot)net" <andrew(at)dunslane(dot)net> |
Subject: | Re: Random pg_upgrade test failure on drongo |
Date: | 2024-01-09 03:05:05 |
Message-ID: | CAA4eK1Koa+w149W1FAWhOQXHXAFLbCekGr2KS+9bd3kbcxj-Ng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 8, 2024 at 9:36 PM Jim Nasby <jim(dot)nasby(at)gmail(dot)com> wrote:
>
> On 1/4/24 10:19 PM, Amit Kapila wrote:
> > On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> >>
> >> 03.01.2024 14:42, Amit Kapila wrote:
> >>>
> >>
> >>>> And the internal process is ... background writer (BgBufferSync()).
> >>>>
> >>>> So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
> >>>> got 20 x 10 tests passing.
> >>>>
> >>>> Thus, it we want just to get rid of the test failure, maybe it's enough to
> >>>> add this to the test's config...
> >>>>
> >>> What about checkpoints? Can't it do the same while writing the buffers?
> >>
> >> As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
> >> the desired effect, but I think it's not impossible in principle.
> >> More details below.
> >> What happens during the pg_upgrade execution is essentially:
> >> 1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
> >> -- this command flushes file buffers as well
> >> 2) ALTER DATABASE postgres OWNER TO ...
> >> 3) COMMENT ON DATABASE "postgres" IS ...
> >> 4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
> >> SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
> >> SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
> >> TRUNCATE pg_catalog.pg_largeobject;
> >> -- ^^^ here we can get the error "could not create file "base/5/2683": File exists"
> >> ...
> >>
> >> We get the effect discussed when the background writer process decides to
> >> flush a file buffer for pg_largeobject during stage 1.
> >> (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
> >> the result must be the same.)
> >> And another important factor is shared_buffers = 1MB (set during the test).
> >> With the default setting of 128MB I couldn't see the failure.
> >>
> >> It can be reproduced easily (on old Windows versions) just by running
> >> pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
> >> default cluster)).
> >> If an old cluster contains dozen of databases, this increases the failure
> >> probability significantly (with 10 additional databases I've got failures
> >> on iterations 4, 1, 6).
> >>
> >
> > I don't have an old Windows environment to test but I agree with your
> > analysis and theory. The question is what should we do for these new
> > random BF failures? I think we should set bgwriter_lru_maxpages to 0
> > and checkpoint_timeout to 1hr for these new tests. Doing some invasive
> > fix as part of this doesn't sound reasonable because this is an
> > existing problem and there seems to be another patch by Thomas that
> > probably deals with the root cause of the existing problem [1] as
> > pointed out by you.
> >
> > [1] - https://commitfest.postgresql.org/40/3951/
>
> Isn't this just sweeping the problem (non-POSIX behavior on SMB and
> ReFS) under the carpet? I realize that synthetic test workloads like
> pg_upgrade in a loop aren't themselves real-world scenarios, but what
> about other cases? Even if we're certain it's not possible for these
> issues to wedge a server, it's still not a good experience for users to
> get random, unexplained IO-related errors...
>
The point is that this is an existing known Windows behavior and that
too only in certain versions. The fix doesn't seem to be
straightforward, so it seems advisable to avoid random BF failures by
having an appropriate configuration.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amul Sul | 2024-01-09 03:59:20 | Re: introduce dynamic shared memory registry |
Previous Message | Masahiko Sawada | 2024-01-09 02:40:03 | Re: [PoC] Improve dead tuple storage for lazy vacuum |