| From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> | 
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> | 
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Tracking latest timeline in standby mode | 
| Date: | 2010-11-01 11:32:59 | 
| Message-ID: | 4CCEA56B.5030307@enterprisedb.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 01.11.2010 12:32, Fujii Masao wrote:
>> A related issue is that we should have a check for the issue I also
>> mentioned in the comments:
>>
>>>         /*
>>>          * If the current timeline is not part of the history of the
>>>          * new timeline, we cannot proceed to it.
>>>          *
>>>          * XXX This isn't foolproof: The new timeline might have forked
>>> from
>>>          * the current one, but before the current recovery location. In
>>> that
>>>          * case we will still switch to the new timeline and proceed
>>> replaying
>>>          * from it even though the history doesn't match what we already
>>>          * replayed. That's not good. We will likely notice at the next
>>> online
>>>          * checkpoint, as the TLI won't match what we expected, but it's
>>>          * not guaranteed. The admin needs to make sure that doesn't
>>> happen.
>>>          */
>>
>> but that's a pre-existing and orthogonal issue, it can with the current code
>> too if you restart the standby, so let's handle that as a separate patch.
>
> I'm thinking to write the timeline switch LSN to the timeline history file, and
> compare LSN with the location of the last applied WAL record when that
> file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.
Yeah, that's one approach. Another is to validate the TLI in the xlog 
page header, it should always match the current timeline we're on. That 
would feel more robust to me.
We're a bit fuzzy about what TLI is written in the page header when the 
timeline changing checkpoint record is written, though. If the 
checkpoint record fits in the previous page, the page will carry the old 
TLI, but if the checkpoint record begins a new WAL page, the new page is 
initialized with the new TLI. I think we should rearrange that so that 
the page header will always carry the old TLI.
>
> +			/* Switch target */
> +			recoveryTargetTLI = newtarget;
> +			expectedTLIs = newExpectedTLIs;
>
> Before "expectedTLIs = newExpectedTLIs", we should call
> list_free_deep(expectedTLIs)?
It's an integer list so list_free(expectedTLIs) is enough, and I doubt 
that leakage will ever be a problem in practice, but in principle you're 
right.
-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Nasby | 2010-11-01 14:14:03 | Re: crash in plancache with subtransactions | 
| Previous Message | Fujii Masao | 2010-11-01 10:32:49 | Re: Tracking latest timeline in standby mode |