From: | Daniel Caldeweyher <dcalde(at)gmail(dot)com> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | "Jonas Tehler *EXTERN*" <jonas(at)tehler(dot)se>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Duplicate data despite unique constraint |
Date: | 2016-09-11 04:36:41 |
Message-ID: | CADVnD3DPpXCp7Aryk7_LwoiDgKFsmSms1bqdQuqGyC6T68othw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I had a similar issue once and was able to recover from it. If this affects
only some rows and you are able to identify them, this is fixable:
--force table scan to skip using corrupt index
set enable_seqscan=1
set enable_indexscan=0
set enable_bitmapscan=0
select email,count(*)
from users
group by email
having count(*) > 1;
Then, if the rows are simply just duplicates and have no other changes, add
a new serial column (or to prevent blocking, add a bigint column and update
with sequential values), then using the emails from above, delete the ones
with the higher/lower sequence number. Ensure you are still skipping
indexes.
Once the table is clean, drop the sequence column again and re-index.
Hope this helps,
Daniel
On Fri, Sep 2, 2016 at 11:06 PM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
wrote:
> Jonas Tehler wrote:
> > We’re running Postgresql 9.4.5 on Amazon RDS. One of our tables looks
> something like this:
> >
> >
> > CREATE TABLE users
> > (
> > ...
> > email character varying(128) NOT NULL,
> > ...
> > CONSTRAINT users_email_key UNIQUE (email)
> > )
> >
> > Despite this we have rows with very similar email values. I discovered
> the problem when I tried to add
> > a column and got the following error:
> >
> > ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: could not
> create unique index
> > "users_email_key"
> > DETAIL: Key (email)=(xxx(at)yyy(dot)com) is duplicated.
> > : ALTER TABLE "users" ADD COLUMN "client_branch" character varying(255)
> DEFAULT ‘beta'
> >
> >
> > Now look at this:
> >
> > => select email from users where email = 'xxx(at)yyy(dot)com';
> > email
> > ---------------------------
> > xxx(at)yyy(dot)com
> > (1 row)
> >
> >
> > => select email from users where email LIKE 'xxx(at)yyy(dot)com';
> > email
> > ---------------------------
> > xxx(at)yyy(dot)com
> > xxx(at)yyy(dot)com
> > (2 rows)
> >
> >
> > I have tried to compare the binary data in various ways, email::bytes,
> md5(email),
> > encode(email::bytea, 'hex’), char_length(email) and it all looks the
> same for both rows.
> >
> > Any suggestions how I can discover the difference between the values and
> how they could have been
> > added without triggering the constraint? I know that the values were
> added after the constraint was
> > added.
> >
> > The data was added from a Ruby on Rails app that also has unique
> constraints on the email field and
> > validation on the email format.
>
> That looks very much like data corruption.
>
> I guess there is an index on "users" that is used for one query but not
> the other.
> Can you verify with EXPLAIN?
>
> Assuming that it is a 'text_ops' or 'varchar_ops' index, I'd say it gets
> used for the first
> query, but not for the second. That would mean that there is an extra
> entry in the table that
> is not in the index.
>
> Did you have any crashes, standby promotion, restore with PITR or other
> unusual occurrences recently?
>
> Make sure you have a physical backup; there may be other things corrupted.
>
> This is a possible path to proceed:
>
> Once you have made sure that you have a physical backup, try to add the
> "ctid" column to both queries.
>
> Then delete the extra row from the second query with "DELETE FROM email
> WHERE ctid = ...".
>
> Then, to make sure there is no other corruption lurking, make a logical
> backup
> with pg_dumpall, create a new database cluster, create a new one with
> "initdb" and
> restore the data.
>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
From | Date | Subject | |
---|---|---|---|
Next Message | dandl | 2016-09-11 05:20:30 | Re: What limits Postgres performance when the whole database lives in cache? |
Previous Message | Jim Nasby | 2016-09-11 02:34:36 | Re: Duplicate data despite unique constraint |