Re: recoveryCheck/008_fsm_truncation is failing on dodo in v14- (due to slow fsync?)

From: Robins Tharakan <tharakan(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recoveryCheck/008_fsm_truncation is failing on dodo in v14- (due to slow fsync?)
Date: 2024-06-28 10:20:08
Message-ID: CAEP4nAxUR5x=ANP1yzPEMBh+VdZ7=LGi5SeC6jp4eu=JmNgcug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 23 Jun 2024 at 22:30, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:

> Unfortunately, the buildfarm log doesn't contain regress_log_002_limits,
> but I managed to reproduce the failure on that my device, when it's storage
> as slow as:
> $ dd if=/dev/zero of=./test count=1024 oflag=dsync bs=128k
> 1024+0 records in
> 1024+0 records out
> 134217728 bytes (134 MB, 128 MiB) copied, 33.9446 s, 4.0 MB/s
>
>
The past ~1 week, I tried to space out all other tasks on the machine, so
as to ensure
that 1-min CPU is mostly <2 (and thus not many things hammering the disk)
and with
that I see 0 failures these past few days. This isn't conclusive by any
means, but it
does seem that reducing IO contention has helped remove the errors, like
what
Alexander suspects / repros here.

Just a note, that I've reverted some of those recent changes now, and so if
the theory
holds true, I wouldn't be surprised if some of these errors restarted on
dodo.

-
robins

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-06-28 11:07:05 Re: pg_createsubscriber: drop pre-existing subscriptions from the converted node
Previous Message Jelte Fennema-Nio 2024-06-28 09:56:43 Re: Converting README documentation to Markdown