Re: Duplicate rows during pg_dump

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Chaz Yoon <chaz(at)shopspring(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Duplicate rows during pg_dump
Date: 2015-10-24 19:45:12
Message-ID: 562BDFC8.8040908@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/24/2015 12:35 PM, Chaz Yoon wrote:
> I am seeing a duplicate, stale copy of the same row when performing a
> pg_dump or copying a specific table, but not when directly selecting
> from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client
> tools.
>
> It's happening on a users table, which has a primary key and enforces a
> unique email address:
>
> Table "public.users"
> Column | Type | Modifiers
>
> ---------------+-----------------------------+---------------------------------------------------
> id | integer | not null default
> nextval('users_id_seq'::regclass)
> email | character varying(255) | not null default
> ''::character varying
> last_activity | timestamp without time zone |
> Indexes:
> "users_pkey" PRIMARY KEY, btree (id)
> "users_unique_email" UNIQUE, btree (email)
>
> I first noticed the problem when doing copying the table to another
> database. Roughly this:
>
> % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
> % psql test_db < users.txt
> [...]
> ERROR: could not create unique index "users_pkey"
> DETAIL: Key (id)=(123) is duplicated.
> [...]
> ERROR: could not create unique index "users_unique_email"
> DETAIL: Key (email)=(this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com>)
> is duplicated.
>
> It appears there's some sort of duplicate record for a single user in
> the database. Checking the pg_dump output, I saw that a single user's
> record was being exported twice:
>
> % grep -i this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com> users.txt
> INSERT INTO users (id, email, last_activity) VALUES (123,
> 'this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com>', '2015-10-21
> 10:32:15.997887');
> INSERT INTO users (id, email, last_activity) VALUES (123,
> 'this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com>', '2015-10-02
> 11:32:58.615743');
>
> The rows were not exactly the same. Connecting to the source database
> directly, I tried this:
>
> remote_db=> select count(1) from users where id = 123;
> count
> -------
> 1
> (1 row)
>
> remote_db=> select count(1) from users where email =
> 'this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com>';
> count
> -------
> 1
> (1 row)
>
> To eliminate any risk of it being a weird locking issue, I restored a
> snapshot of the database into a new RDS instance but I got the same
> results. I then tried the following:
>
> remote_db=> create table users_copy_with_indexes (like users
> including defaults including constraints including indexes including
> storage including comments);
> CREATE TABLE
> remote_db=> insert into users_copy_with_indexes select * from users;
> ERROR: duplicate key value violates unique constraint
> "users_copy_with_indexes_pkey"
> DETAIL: Key (id)=(123) already exists.
>
> However, when I created a copy without the indexes, I can see the
> duplicate rows:
>
> remote_db=> create table users_copy_without_indexes (like users);
> CREATE TABLE
> remote_db=> insert into users_copy_without_indexes select * from users;
> INSERT 0 523342
> remote_db=> select count(1) from users_copy_without_indexes where id
> = 123;
> count
> -------
> 2
> (1 row)
> remote_db=> select count(1) from users_copy_without_indexes where
> email = 'this_user(at)xyz(dot)com <mailto:this_user(at)xyz(dot)com>';
> count
> -------
> 2
> (1 row)
>
> Any suggestions for what to look for next? Is it table corruption?

I would say the smoking gun is the copy w/o indexes shows both records
and the one with indexes only one. I would DROP/CREATE index on the
original table, with the usual caveat that this does place a load on the
table. Using Concurrently might help, but I would read the information here:

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Building Indexes Concurrently
>
> Chaz
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Mamin 2015-10-24 20:15:15 Re: Duplicate rows during pg_dump
Previous Message Rafal Pietrak 2015-10-24 19:42:37 Re: partial JOIN (was: ID column naming convention)