Re: COPY TO returning empty result with parallel ALTER TABLE

From: Sven Wegener <sven(dot)wegener(at)stealer(dot)net>
To: Bernd Helmle <mailings(at)oopsware(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: COPY TO returning empty result with parallel ALTER TABLE
Date: 2014-11-04 19:33:32
Message-ID: alpine.LNX.2.11.1411041946500.26161@titan.int.lan.stealer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

On Tue, 4 Nov 2014, Bernd Helmle wrote:

> --On 3. November 2014 18:15:04 +0100 Sven Wegener <sven(dot)wegener(at)stealer(dot)net>
> wrote:
>
> > I've check git master and 9.x and all show the same behaviour. I came up
> > with the patch below, which is against curent git master. The patch
> > modifies the COPY TO code to create a new snapshot, after acquiring the
> > necessary locks on the source tables, so that it sees any modification
> > commited by other backends.
>
> Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE
> command has rewritten all tuples with its own XID, thus the current snapshot
> does not "see" these tuples anymore. I suppose that in SERIALIZABLE or
> REPEATABLE READ transaction isolation your proposal still doesn't return the
> tuples you'd like to see.

No, but with REPEATABLE READ and SERIALIZABLE the plain SELECT case is
currently broken too. The issue I'm fixing is that COPY (SELECT ...) TO
and SELECT behave differently.

So, how does an ALTER TABLE should behave transaction-wise? PostgreSQL
claims transactional DDL support. As mentioned above a parallel SELECT
with SERIALIZABLE returns an empty result, but it also sees the schema
change, at least the change shows up in the result tuple description. The
change doesn't show up in pg_catalog, so that bit is handled correctly.
The schema change transaction can be rolled back the way it is now, so
it's kind of transactional, but other transactions seeing the schema
change in query results is broken.

The empty result might be fixed by just keeping the old XID of rewritten
tuples, as the exclusive lock ALTER TABLE helds should be enough to make
sure nobody is actively accessing the rewritten table. But I'm pondering
if there is a solution for the visible schema change that doesn't involve
keeping the old data files around or to just raise an serialization error.

Coming back on my mentioned solution of setting the XID of rewritten
tuples to the FrozenXID. That's broken as in the REPEATABLE
READ/SERIALIZABLE case there might be tuples that should not be seen by
older transactions and moving the tuples to the FrozenXID would make them
visible.

Sven

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Sven Wegener 2014-11-04 19:35:42 Re: COPY TO returning empty result with parallel ALTER TABLE
Previous Message Andrew Dunstan 2014-11-04 19:33:07 Re: COPY TO returning empty result with parallel ALTER TABLE

Browse pgsql-general by date

  From Date Subject
Next Message Sven Wegener 2014-11-04 19:35:42 Re: COPY TO returning empty result with parallel ALTER TABLE
Previous Message Andrew Dunstan 2014-11-04 19:33:07 Re: COPY TO returning empty result with parallel ALTER TABLE

Browse pgsql-hackers by date

  From Date Subject
Next Message Sven Wegener 2014-11-04 19:35:42 Re: COPY TO returning empty result with parallel ALTER TABLE
Previous Message Andrew Dunstan 2014-11-04 19:33:07 Re: COPY TO returning empty result with parallel ALTER TABLE