From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe? |
Date: | 2015-01-26 21:03:03 |
Message-ID: | 20150126210303.GD5568@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2015-01-22 19:56:07 +0100, Andres Freund wrote:
> Hi,
>
> On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> > I'm analyzing a problem in which a customer had a pg_basebackup (from
> > standby) created 9.2 cluster that failed with "WAL contains references to
> > invalid pages". The failed record was a "xlog redo visible"
> > i.e. XLOG_HEAP2_VISIBLE.
> >
> > First I thought there might be another bug along the line of
> > 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> > case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> > seem to have any problems.
> >
> > Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> > the basebackup was started and finished *before* pg_basebackup finished.
> >
> > movedb() basically works in these steps:
> > 1) lock out users of the database
> > 2) RequestCheckpoint(IMMEDIATE|WAIT)
> > 3) DropDatabaseBuffers()
> > 4) copydir()
> > 5) XLogInsert(XLOG_DBASE_CREATE)
> > 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> > 7) rmtree(src_dbpath)
> > 8) XLogInsert(XLOG_DBASE_DROP)
> > 9) unlock database
> >
> > If a basebackup starts while 4) is in progress and continues until 7)
> > happens I think a pretty wide race opens: The basebackup can end up with
> > a partial copy of the database in the old tablespace because the
> > rmtree(old_path) concurrently was in progress. Normally such races are
> > fixed during replay. But in this case, the replay of the
> > XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> > fixing nothing.
> >
> > Besides making AD .. ST use sane WAL logging, which doesn't seem
> > backpatchable, I don't see what could be done against this except
> > somehow making basebackups fail if a AD .. ST is in progress. Which
> > doesn't look entirely trivial either.
>
> I basically have two ideas to fix this.
>
> 1) Make do_pg_start_backup() acquire a SHARE lock on
> pg_database. That'll prevent it from starting while a movedb() is
> still in progress. Then additionally add pg_backup_in_progress()
> function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
> XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
> movedb() to error out if a backup is in progress.
Attached is a patch trying to this. Doesn't look too bad and lead me to
discover missing recovery conflicts during a AD ST.
But: It doesn't actually work on standbys, because lock.c prevents any
stronger lock than RowExclusive from being acquired. And we need need a
lock that can conflict with WAL replay of DBASE_CREATE, to handle base
backups that are executed on the primary. Those obviously can't detect
whether any standby is currently doing a base backup...
I currently don't have a good idea how to mangle lock.c to allow
this. I've played with doing it like in the second patch, but that
doesn't actually work because of some asserts around ProcSleep - leading
to locks on database objects not working in the startup process (despite
already being used).
The easiest thing would be to just use a lwlock instead of a heavyweight
lock - but those aren't canceleable...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-01-26 21:05:53 | Re: New CF app deployment |
Previous Message | Magnus Hagander | 2015-01-26 21:01:23 | Re: New CF app deployment |