COPY TO returning empty result with parallel ALTER TABLE

From: Sven Wegener <sven(dot)wegener(at)stealer(dot)net>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: COPY TO returning empty result with parallel ALTER TABLE
Date: 2014-11-03 17:15:04
Message-ID: alpine.LNX.2.11.1411031751320.29849@titan.int.lan.stealer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

Hi all,

we experienced what seems to be a bug in the COPY TO implementation. When a
table is being rewritten by an ALTER TABLE statement, a parallel COPY TO
results in an empty result.

Consider the following table data:

CREATE TABLE test (id INTEGER NOT NULL, PRIMARY KEY (id));
INSERT INTO test (id) SELECT generate_series(1, 10000000);

One session does:

ALTER TABLE test ADD COLUMN dummy BOOLEAN NOT NULL DEFAULT FALSE;

This acquires an exclusive lock to the table.

Another session now performs parallel:

COPY test TO STDOUT;

This blocks on the exclusive lock held by the ALTER statement. When the ALTER
staement is finished, the COPY statement returns with an empty result. Same
goes for COPY (SELECT ...) TO, whereas the same SELECT executed without COPY
blocks and returns the correct result as expected.

This is my analysis of this issue:

The backend opens the rewritten data files, but it ignores the complete
content, which indicates the data is being ignored because of XIDs. For direct
SELECT statements the top-level query parsing acquires locks on involved tables
and creates a new snapshot, but for COPY statements the parsing and locking is
done later in COPY code. After locking the tables in COPY code, the data
is read with an old snapshot, effectively ignoring all data from the rewritten
table.

I've check git master and 9.x and all show the same behaviour. I came up with
the patch below, which is against curent git master. The patch modifies the
COPY TO code to create a new snapshot, after acquiring the necessary locks on
the source tables, so that it sees any modification commited by other backends.

Despite thinking this is the correct solution, another solution or
optimization would be to have ALTER TABLE, which holds the highest lock level,
set the XID of rewritten tuples to the FrozenXID, as no other backend should
access the table before the ALTER TABLE is committed.

Sven

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b83576..fe2d157 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1344,6 +1344,13 @@ BeginCopy(bool is_from,
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("table \"%s\" does not have OIDs",
RelationGetRelationName(cstate->rel))));
+
+ /*
+ * Use a new snapshot to ensure this query sees
+ * results of any previously executed queries.
+ */
+ if (!is_from)
+ PushActiveSnapshot(GetTransactionSnapshot());
}
else
{
@@ -1394,11 +1401,10 @@ BeginCopy(bool is_from,
plan = planner(query, 0, NULL);

/*
- * Use a snapshot with an updated command ID to ensure this query sees
+ * Use a new snapshot to ensure this query sees
* results of any previously executed queries.
*/
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
+ PushActiveSnapshot(GetTransactionSnapshot());

/* Create dest receiver for COPY OUT */
dest = CreateDestReceiver(DestCopyOut);
@@ -1741,9 +1747,11 @@ EndCopyTo(CopyState cstate)
ExecutorFinish(cstate->queryDesc);
ExecutorEnd(cstate->queryDesc);
FreeQueryDesc(cstate->queryDesc);
- PopActiveSnapshot();
}

+ /* Discard snapshot */
+ PopActiveSnapshot();
+
/* Clean up storage */
EndCopy(cstate);
}

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Saul Costa 2014-11-03 21:23:58 PostgreSQL 9.3 Segfault in "hstore_to_json" function
Previous Message Tom Lane 2014-11-03 15:47:38 Re: BUG #11829: pgcrypto doc bug

Browse pgsql-general by date

  From Date Subject
Next Message Moshe Jacobson 2014-11-03 17:39:14 STABLE vs. IMMUTABLE w.r.t. indexes
Previous Message pbj 2014-11-03 16:33:58 Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-11-03 17:31:37 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Stephen Frost 2014-11-03 17:08:05 Re: Additional role attributes && superuser review