From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "andrew(at)dunslane(dot)net" <andrew(at)dunslane(dot)net> |
Subject: | Re: Random pg_upgrade test failure on drongo |
Date: | 2024-01-10 09:31:30 |
Message-ID: | CAA4eK1Lq75HXRxucGrKzWNk8540kdk9dj0B4-6DMcHAZ+CE5+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> 09.01.2024 13:08, Amit Kapila wrote:
> >
> >> As to checkpoint_timeout, personally I would not increase it, because it
> >> seems unbelievable to me that pg_restore (with the cluster containing only
> >> two empty databases) can run for longer than 5 minutes. I'd rather
> >> investigate such situation separately, in case we encounter it, but maybe
> >> it's only me.
> >>
> > I feel it is okay to set a higher value of checkpoint_timeout due to
> > the same reason though the probability is less. I feel here it is
> > important to explain in the comments why we are using these settings
> > in the new test. I have thought of something like: "During the
> > upgrade, bgwriter or checkpointer could hold the file handle for some
> > removed file. Now, during restore when we try to create the file with
> > the same name, it errors out. This behavior is specific to only some
> > specific Windows versions and the probability of seeing this behavior
> > is higher in this test because we use wal_level as logical via
> > allows_streaming => 'logical' which in turn sets shared_buffers as
> > 1MB."
> >
> > Thoughts?
>
> I would describe that behavior as "During upgrade, when pg_restore performs
> CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
> handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
> fail if OS (such as older Windows versions) doesn't remove an unlinked file
> completely till it's open. ..."
>
I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.
--
With Regards,
Amit Kapila.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Fix-an-intermetant-BF-failure-in-003_logical_slot.patch | application/octet-stream | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-01-10 09:34:15 | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Previous Message | Shlok Kyal | 2024-01-10 09:29:22 | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |