From: | Chaz Yoon <chaz(at)shopspring(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Duplicate rows during pg_dump |
Date: | 2015-10-24 19:35:43 |
Message-ID: | CAHPXOQx0UUeHOEKcf24vVF5ESvwbu=zB+uRgwg-AUVrSeRM2xw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I am seeing a duplicate, stale copy of the same row when performing a
pg_dump or copying a specific table, but not when directly selecting from
it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client tools.
It's happening on a users table, which has a primary key and enforces a
unique email address:
Table "public.users"
Column | Type | Modifiers
---------------+-----------------------------+---------------------------------------------------
id | integer | not null default
nextval('users_id_seq'::regclass)
email | character varying(255) | not null default
''::character varying
last_activity | timestamp without time zone |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_unique_email" UNIQUE, btree (email)
I first noticed the problem when doing copying the table to another
database. Roughly this:
% pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
% psql test_db < users.txt
[...]
ERROR: could not create unique index "users_pkey"
DETAIL: Key (id)=(123) is duplicated.
[...]
ERROR: could not create unique index "users_unique_email"
DETAIL: Key (email)=(this_user(at)xyz(dot)com) is duplicated.
It appears there's some sort of duplicate record for a single user in the
database. Checking the pg_dump output, I saw that a single user's record
was being exported twice:
% grep -i this_user(at)xyz(dot)com users.txt
INSERT INTO users (id, email, last_activity) VALUES (123, '
this_user(at)xyz(dot)com', '2015-10-21 10:32:15.997887');
INSERT INTO users (id, email, last_activity) VALUES (123, '
this_user(at)xyz(dot)com', '2015-10-02 11:32:58.615743');
The rows were not exactly the same. Connecting to the source database
directly, I tried this:
remote_db=> select count(1) from users where id = 123;
count
-------
1
(1 row)
remote_db=> select count(1) from users where email = 'this_user(at)xyz(dot)com';
count
-------
1
(1 row)
To eliminate any risk of it being a weird locking issue, I restored a
snapshot of the database into a new RDS instance but I got the same
results. I then tried the following:
remote_db=> create table users_copy_with_indexes (like users including
defaults including constraints including indexes including storage
including comments);
CREATE TABLE
remote_db=> insert into users_copy_with_indexes select * from users;
ERROR: duplicate key value violates unique constraint
"users_copy_with_indexes_pkey"
DETAIL: Key (id)=(123) already exists.
However, when I created a copy without the indexes, I can see the duplicate
rows:
remote_db=> create table users_copy_without_indexes (like users);
CREATE TABLE
remote_db=> insert into users_copy_without_indexes select * from users;
INSERT 0 523342
remote_db=> select count(1) from users_copy_without_indexes where id =
123;
count
-------
2
(1 row)
remote_db=> select count(1) from users_copy_without_indexes where email =
'this_user(at)xyz(dot)com';
count
-------
2
(1 row)
Any suggestions for what to look for next? Is it table corruption?
Chaz
From | Date | Subject | |
---|---|---|---|
Next Message | Lele Gaifax | 2015-10-24 19:37:57 | Re: Using function returning multiple values in a select |
Previous Message | Rafal Pietrak | 2015-10-24 19:03:11 | Re: partial JOIN (was: ID column naming convention) |