Re: "Resurrected" data files - problem?

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Simon Riggs *EXTERN*" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Peter Childs" <peterachilds(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: "Resurrected" data files - problem?
Date: 2007-11-09 08:24:05
Message-ID: D960CB61B694CF459DCFB4B0128514C2880060@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>>>> So if we perform our database backups with incremental
>>>>> backups as described above, we could end up with additional
>>>>> files after the restore, because PostgreSQL files can get
>>>>> deleted (e.g. during DROP TABLE or TRUNCATE TABLE).
>>>>>
>>>>> Could such "resurrected" files (data files, files in
>>>>> pg_xlog, pg_clog or elsewhere) cause a problem for the database
>>>>> (other than the obvious one that there may be unnecessary files
>>>>> about that consume disk space)?
>>>>
>>>> This will not work at all.
>>>
>>> To be more specific: the resurrected files aren't the problem;
>>> offhand I see no reason they'd create any issue beyond wasted
>>> disk space. The problem is version skew between files that were
>>> backed up at slightly different times, leading to inconsistency.
>>
>> I should have mentioned that before the (incremental) backup
>> there would be a pg_start_backup() and a pg_stop_backup()
>> afterwards, and we would use PITR.
>>
>> So there could only be three kinds of files:
>> - Files that did not change since the full backup, restored
>> from there. They should therefore look exactly as if the
>> online backup were performed in the normal way.
>> - Files that have changed or are new, restored from the
>> incremental backup. These will also be ok, because
>> they were backed up between pg_start_backup() and
>> pg_stop_backup().
>> - Files that have been deleted between full and incremental
>> backup and have been resurrected.
>>
>> This third group is the only one which might be problematic,
>> as far as I can see, because PostgreSQL will no expect them to
>> be there.
>>
>> The version skew between files backed up at slightly different
>> times should be taken care of by PITR, shouldn't it?
>
> The backup is not instantaneous, so there is no single time
> at which the hot backup takes place. So deciding whether
> a file has changed based upon a comparison of two file timestamps
> cannot work. You would need to take timestamps for the file both
> before the pg_start_backup() and after the pg_stop_backup()
> of the file for both full and incremental backups.
> If all four timestamps are equivalent, then you are safe.

I am afraid that there is still a misunderstanding. The
procedure would be as follows:

The backup:

- pg_start_backup()
- full backup of the PostgreSQL files
- pg_stop_backup()

Next day:

- pg_start_backup()
- backup of the files that have changed since the last backup
- pg_stop_backup()

The recovery:

- restore files from the full backup
- restore files from the incremental backup
- create recovery.conf and start the server

With a normal online backup, the backup also does not take
place at a single time. Why is that no problem there, but is
a problem here?

> The relfilenode ids are potentially reused after a period of time, so
> that could cause errors if not catered for on the incremental restore.

That was my original fear.

What happens if PostgreSQL tries to reuse a relfilenode and the file
is already present? Will the database crash or report an error?

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Simon Riggs 2007-11-09 08:32:49 Re: "Resurrected" data files - problem?
Previous Message Tom Lane 2007-11-09 03:36:37 Re: (Never?) Kill Postmaster?