Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)
Date: 2024-08-21 13:18:49
Message-ID: b6083df1-623d-4f25-bbb9-9f3fdf292c00@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/21/24 14:58, Robert Haas wrote:
> ...
>
> All we're doing here is taking an incremental backup of 1-table
> database that had 1 row at the time of the full backup and has had 1
> more row inserted since then. On my system, the last time I ran this
> regression test, this step completed in 410ms. It shouldn't be
> expensive. So I'm inclined to chalk this up to the machine not having
> enough resources. The only thing that I don't really understand is why
> this particular test would fail vs. anything else. We have a bunch of
> tests that take backups. A possibly important difference here is that
> this one is an incremental backup, so it would need to read WAL
> summary files from the beginning of the full backup to the beginning
> of the current backup and combine them into one super-summary that it
> could then use to decide what to include in the incremental backup.
> However, since this is an artificial example with just 1 insert
> between the full and the incremental, it's hard to imagine that being
> expensive, unless there's some low-probability bug that makes it go
> into an infinite loop or chew up a million CPU cycles or something.
> That's not impossible, but given the discussion between you and Tomas,
> I'm kinda hoping it was just a hardware issue.
>
> Barring objections or other similar trouble reports, I think we should
> just close out this open item.
>

+1 to just close it

The animal is running FreeBSD on rpi4, and used to be running from a
flash disk. Seems FreeBSD has some trouble with that, which likely
contributed to the failures (a bit weird it affected just this test).

Moving to a better storage (SATA SSD over USB) improved the situation
quite a bit. It's a bit too early to say for sure, ofc. But I don't
think the test itself is broken.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-21 13:19:58 Re: Requiring LLVM 14+ in PostgreSQL 18
Previous Message Robert Haas 2024-08-21 13:10:06 Re: generic plans and "initial" pruning