Re: [PATCH] Lazy xid assingment V2

From: August Zajonc <augustz(at)augustz(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Lazy xid assingment V2
Date: 2007-09-01 22:17:50
Message-ID: 46D9E50E.8030106@augustz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Florian G. Pflug wrote:
> August Zajonc wrote:
>> I'm confused about this.
>>
>> As long as we assert the rule that the file name can't change on the
>> move, then after commit the file can be in only one of two places.
>> The name of the file is known (ie, pg_class). The directories are
>> known. What needs to be carried forwarded past a checkpoint? We don't
>> even look at WAL, so checkpoints are irrelevant it seems
>> If there is a crash just after commit and before the move, no harm.
>> You just move on startup. If the move fails, no harm, you can emit
>> warning and open in /pending (or simply error, even easier).
> If you're going to open the file from /pending, whats the point of moving
> it in the first place?
Allow time for someone to sort out the disk situation without impacting
things. Preserves the concept that after COMMIT a file exists on disk
that is accessible, which is what people I think expect.
> The idea would have to be that you move on commit (Or on COMMIT-record
> replay, in case of a crash), and then, after recovering the whole wal,
> you could remove leftover files in /pending.
I think so. What I was thinking was

Limit to moves from spclocation/file.new to spclocation/file. So given a
pg_class filename the datafile can only have two possible names.
relfilenode or relfilenode.new

you commit, then move.

If crash occurs before commit, you leak a .new table.

If move fails after commit you emit warning. The commit is still valid,
because fopen can fall back to .new. The data is still there.

On crash recovery move .new files that show up in pg_class to their
proper name if the disk has been fixed and move can succeed. For .new
files that don't exist in pg_class after log replay, delete them.

Fallback open.

fopen relfilenode,
if ENOENT fopen relfilenode.new
if ENOENT error
elseif emit warning

>
> So, what are you going to do if the move fails? You cannot roll back, and
> you cannot update the CLOG (because than others would see your new table,
> but no datafile). The only option is to PANIC. This will lead to a server
> restart, WAL recovery, and probably another PANIC once the COMMIT-record
> is replayed (Since the move probably still won't be possible).
That was the idea of the fallback generally, avoid this issue. You never
rollback. If datafile is not where expected, it can only be one other
place.
> It might be even worse - I'm not sure that a rename is an atomic
> operation
> on most filesystems. If it's not, then you might end up with two files if
> power fails *just* as you rename, or, worse with no file at all. Even
> a slight
> possibility of the second case seems unacceptable - I means loosing
> a committed transaction.
Yes, atomic renames are an assumption.
>
> I agree that we should eventually find a way to guarantee either no file
> leakage, or at least an upper bound on the amount of wasted space. But
> doing so at the cost of PANICing if the move fails seems like a bad
> tradeoff...
>
Agreed...
> greetings, Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message August Zajonc 2007-09-01 22:20:16 Re: [PATCH] Lazy xid assingment V2
Previous Message Tom Lane 2007-09-01 21:45:30 Re: [PATCH] Lazy xid assingment V2