Re: Duplicate data despite unique constraint

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Jonas Tehler <jonas(at)tehler(dot)se>, pgsql-general(at)postgresql(dot)org
Subject: Re: Duplicate data despite unique constraint
Date: 2016-09-02 13:02:53
Message-ID: 3481cafa-cf8c-4401-722f-38eed301f9bd@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 09/02/2016 04:32 AM, Jonas Tehler wrote:
>
> Hi,
>
> We’re running Postgresql 9.4.5 on Amazon RDS. One of our tables
> looks something like this:
>
> CREATE TABLE users
> (
> ...
> email character varying(128) NOT NULL,
> ...
> CONSTRAINT users_email_key UNIQUE (email)
> )
>
> Despite this we have rows with very similar email values. I discovered
> the problem when I tried to add a column and got the following error:
>
> ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: could not
> create unique index "users_email_key"
> DETAIL: Key (email)=(xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>) is duplicated.
> : ALTER TABLE "users" ADD COLUMN "client_branch" character varying(255)
> DEFAULT ‘beta'
>
> Now look at this:
>
> => select email from users where email = 'xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>';
> email
> ---------------------------
> xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
> (1 row)
>
> => select email from users where email LIKE 'xxx(at)yyy(dot)com
> <mailto:xxx(at)yyy(dot)com>';
> email
> ---------------------------
> xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
> xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
> (2 rows)
>
> I have tried to compare the binary data in various ways, email::bytes,
> md5(email), encode(email::bytea, 'hex’), char_length(email) and it all
> looks the same for both rows.
>
> Any suggestions how I can discover the difference between the values and
> how they could have been added without triggering the constraint? I know
> that the values were added after the constraint was added.

select ctid, email from users where email LIKE 'xxx(at)yyy(dot)com';

https://www.postgresql.org/docs/9.5/static/ddl-system-columns.html
"ctid

The physical location of the row version within its table. Note
that although the ctid can be used to locate the row version very
quickly, a row's ctid will change if it is updated or moved by VACUUM
FULL. Therefore ctid is useless as a long-term row identifier. The OID,
or even better a user-defined serial number, should be used to identify
logical rows.
"

Best guess is the INDEX on the column is corrupted and needs to be
reindexed:

https://www.postgresql.org/docs/9.5/static/sql-reindex.html

>
> The data was added from a Ruby on Rails app that also has unique
> constraints on the email field and validation on the email format.
>
> / Jonas
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2016-09-02 13:06:13 Re: Duplicate data despite unique constraint
Previous Message Jonas Tehler 2016-09-02 11:32:56 Duplicate data despite unique constraint