Re: Recovery inconsistencies, standby much larger than primary

From: Greg Stark <stark(at)mit(dot)edu>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recovery inconsistencies, standby much larger than primary
Date: 2014-01-27 01:11:38
Message-ID: CAM-w4HPQW1WAPY_TfUNEEkDwshbTCeov7nhG7Z6iU7rTN6FTXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 26, 2014 at 9:45 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2014-01-24 19:23:28 -0500, Greg Stark wrote:
>> Since the point release we've run into a number of databases that when
>> we restore from a base backup end up being larger than the primary
>> database was. Sometimes by a large factor. The data below is from
>> 9.1.11 (both primary and standby) but we've seen the same thing on
>> 9.2.6.
>
> What's the procedure for creating those standbys? Were they of similar
> size after being cloned?

These are restored from base backup using WAL-E and then started in
standby mode. The logs are retrieved using archive_command (which is
WAL-E) after it retrieves lots of archived wal the database switches
to streaming.

We confirmed from size monitoring that the standby database grew
substantially before the time it reported reaching consistent state,
so I only downloaded the WAL from that range for analysis.

>> primary$ for i in 1261982 1364767 1366221 473158 ; do echo -n "$i " ;
>> du -shc $i* | tail -1 ; done
>> 1261982 29G total
>> 1364767 23G total
>> 1366221 12G total
>> 473158 76G total
>>
>> standby$ for i in 1261982 1364767 1366221 473158 ; do echo -n "$i " ;
>> du -shc $i* | tail -1 ; done
>> 1261982 55G total
>> 1364767 28G total
>> 1366221 17G total
>> 473158 139G total
>> ...
>> The first three are btrees and the fourth is a haeap btw.
>
> Are those all of the same underlying heap relation?

Are you asking whether the relfilenode was reused for a different
relation? I doubt it.

Or are you asking if the first three indexes are for the same heap
(presumably the fourth one)? I don't think so but I can check.

>> We're also seeing log entries about "wal contains reference to invalid
>> pages" but these errors seem only vaguely correlated. Sometimes we get
>> the errors but the tables don't grow noticeably and sometimes we don't
>> get the errors and the tables are much larger.
>
> Uhm. I am a bit confused. You see those in the standby's log? At !debug
> log levels? That'd imply that the standby is dead and needed to be
> recloned, no? How do you continue after that?

It's possible I'm confusing symptoms from an unrelated problem. But
the symptom we saw was that it got this error, recovery crashed, then
recovery started again and it replayed fine. I agree that doesn't jive
with the code I see in 9.3, I didn't check how long the code was this
tense though.

>> Much of the added space is uninitialized pages as you might expect but
>> I don't understand is how the database can start up without running
>> into the "reference to invalid pages" panic consistently. We check
>> both that there are no references after consistency is reached *and*
>> that any references before consistency are resolved by a truncate or
>> unlink before consistency.
>
> Well, it's pretty easy to get into a situation with lot's of new
> pages. Lots of concurrent inserts that all fail before logging WAL. The
> next insert will extend the relation and only initialise that last
> value.
>
> It'd be interesting to look for TRUNCATE records using xlogdump. Could
> you show those for starters?

There are no records matching grep -i truncate in any of those
extracts for those relfilenodes. I'm grepping the whole xlogdump now
but it'll take a while. So far no truncates anywhere.

>> I'm assuming this is somehow related to the mulixact or transaction
>> wraparound problems but I don't really understand how they could be
>> hitting when both the primary and standby are post-upgrade to the most
>> recent point release which have the fixes
>
> That doesn't sound likely. For one the symptoms don't fit, for another,
> those problems are mostly 9.3+.

These problems all started to appear after the latest point release
btw. That could just be a coincidence of course.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-01-27 01:17:21 Re: running make check with only specified tests
Previous Message Jon Nelson 2014-01-27 01:01:37 Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT