Quick Links

Duplicate rows during pg_dump

From:	Chaz Yoon <chaz(at)shopspring(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Duplicate rows during pg_dump
Date:	2015-10-24 19:35:43
Message-ID:	CAHPXOQx0UUeHOEKcf24vVF5ESvwbu=zB+uRgwg-AUVrSeRM2xw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I am seeing a duplicate, stale copy of the same row when performing a
pg_dump or copying a specific table, but not when directly selecting from
it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client tools.

It's happening on a users table, which has a primary key and enforces a
unique email address:

Table "public.users"
Column | Type | Modifiers

---------------+-----------------------------+---------------------------------------------------
id | integer | not null default
nextval('users_id_seq'::regclass)
email | character varying(255) | not null default
''::character varying
last_activity | timestamp without time zone |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_unique_email" UNIQUE, btree (email)

I first noticed the problem when doing copying the table to another
database. Roughly this:

% pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
% psql test_db < users.txt
[...]
ERROR: could not create unique index "users_pkey"
DETAIL: Key (id)=(123) is duplicated.
[...]
ERROR: could not create unique index "users_unique_email"
DETAIL: Key (email)=(this_user(at)xyz(dot)com) is duplicated.

It appears there's some sort of duplicate record for a single user in the
database. Checking the pg_dump output, I saw that a single user's record
was being exported twice:

% grep -i this_user(at)xyz(dot)com users.txt
INSERT INTO users (id, email, last_activity) VALUES (123, '
this_user(at)xyz(dot)com', '2015-10-21 10:32:15.997887');
INSERT INTO users (id, email, last_activity) VALUES (123, '
this_user(at)xyz(dot)com', '2015-10-02 11:32:58.615743');

The rows were not exactly the same. Connecting to the source database
directly, I tried this:

remote_db=> select count(1) from users where id = 123;
count
-------
1
(1 row)

remote_db=> select count(1) from users where email = 'this_user(at)xyz(dot)com';
count
-------
1
(1 row)

To eliminate any risk of it being a weird locking issue, I restored a
snapshot of the database into a new RDS instance but I got the same
results. I then tried the following:

remote_db=> create table users_copy_with_indexes (like users including
defaults including constraints including indexes including storage
including comments);
CREATE TABLE
remote_db=> insert into users_copy_with_indexes select * from users;
ERROR: duplicate key value violates unique constraint
"users_copy_with_indexes_pkey"
DETAIL: Key (id)=(123) already exists.

However, when I created a copy without the indexes, I can see the duplicate
rows:

remote_db=> create table users_copy_without_indexes (like users);
CREATE TABLE
remote_db=> insert into users_copy_without_indexes select * from users;
INSERT 0 523342
remote_db=> select count(1) from users_copy_without_indexes where id =
123;
count
-------
2
(1 row)
remote_db=> select count(1) from users_copy_without_indexes where email =
'this_user(at)xyz(dot)com';
count
-------
2
(1 row)

Any suggestions for what to look for next? Is it table corruption?

Chaz

Responses

Re: Duplicate rows during pg_dump at 2015-10-24 19:45:12 from Adrian Klaver

Browse pgsql-general by date

	From	Date	Subject
Next Message	Lele Gaifax	2015-10-24 19:37:57	Re: Using function returning multiple values in a select
Previous Message	Rafal Pietrak	2015-10-24 19:03:11	Re: partial JOIN (was: ID column naming convention)