Re: BUG #17846: pg_dump doesn't properly dump with paused WAL replay

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: francisco(dot)reinolds(at)channable(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17846: pg_dump doesn't properly dump with paused WAL replay
Date: 2023-03-16 15:10:57
Message-ID: 3777456.1678979457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> For backups, we use pg_dump to perform a full database dump. Before we start
> a backup, we pause the WAL replay on the secondary, unpausing it after it is
> concluded. This was done since we previously encountered problems with
> pg_dump failing when an AccessExclusiveLock was held on a table that pg_dump
> was going to dump.

> For some time we faced no problems with this setup, but starting some months
> ago, we started witnessing sporadic failures when we attempted to restore
> the dumps of one of our databases, to verify the dump's integrity. These
> restore failures would occur due to a key not being present in a table:

I really have no idea what's going on there, but can you show the exact
pg_dump command(s) being issued? I'm particularly curious whether you
are using parallel dump. The same for the failing pg_restore.

Also, are all the moving parts (primary server, secondary server,
pg_dump, pg_restore) exactly the same PG version?

> We have managed, with some help from the Postgres IRC channel (special
> thanks to user nickb), to work around the problem. The solution was to begin
> a transaction, and extract a snapshot that'd be passed as a pg_dump
> argument, and only then pause WAL replay. From our understanding, pg_dump
> should already implicitly pick a suitable point to start the dump but it
> apparently is not the case, hence the bug report.

It's the other way around: the replay mechanism should not damage
any data that's visible to an open snapshot. So I agree this smells
like a bug, but we don't have enough info here to reproduce it.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-03-16 18:00:01 BUG #17847: Unaligned memory access in ltree_gist
Previous Message jian he 2023-03-16 14:28:27 Re: BUG #17845: insert into on conflict bug .