| From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> | 
|---|---|
| To: | Jonas Tehler <jonas(at)tehler(dot)se>, pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: Duplicate data despite unique constraint | 
| Date: | 2016-09-02 13:02:53 | 
| Message-ID: | 3481cafa-cf8c-4401-722f-38eed301f9bd@aklaver.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 09/02/2016 04:32 AM, Jonas Tehler wrote:
>
> Hi,
>
> We’re running Postgresql 9.4.5 on Amazon RDS. One of our tables
> looks something like this:
>
> CREATE TABLE users
> (
>   ...
>   email character varying(128) NOT NULL,
>   ...
>   CONSTRAINT users_email_key UNIQUE (email)
> )
>
> Despite this we have rows with very similar email values. I discovered
> the problem when I tried to add a column and got the following error:
>
> ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR:  could not
> create unique index "users_email_key"
> DETAIL:  Key (email)=(xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>) is duplicated.
> : ALTER TABLE "users" ADD COLUMN "client_branch" character varying(255)
> DEFAULT ‘beta'
>
> Now look at this:
>
> => select email from users where email = 'xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>';
>            email
> ---------------------------
>  xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
> (1 row)
>
> => select email from users where email LIKE 'xxx(at)yyy(dot)com
> <mailto:xxx(at)yyy(dot)com>';
>            email
> ---------------------------
>  xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
>  xxx(at)yyy(dot)com <mailto:xxx(at)yyy(dot)com>
> (2 rows)
>
> I have tried to compare the binary data in various ways, email::bytes,
> md5(email), encode(email::bytea, 'hex’), char_length(email) and it all
> looks the same for both rows.
>
> Any suggestions how I can discover the difference between the values and
> how they could have been added without triggering the constraint? I know
> that the values were added after the constraint was added.
select ctid, email from users where email LIKE 'xxx(at)yyy(dot)com';
https://www.postgresql.org/docs/9.5/static/ddl-system-columns.html
"ctid
     The physical location of the row version within its table. Note 
that although the ctid can be used to locate the row version very 
quickly, a row's ctid will change if it is updated or moved by VACUUM 
FULL. Therefore ctid is useless as a long-term row identifier. The OID, 
or even better a user-defined serial number, should be used to identify 
logical rows.
"
Best guess is the INDEX on the column is corrupted and needs to be 
reindexed:
https://www.postgresql.org/docs/9.5/static/sql-reindex.html
>
> The data was added from a Ruby on Rails app that also has unique
> constraints on the email field and validation on the email format.
>
> / Jonas
>
>
-- 
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Albe Laurenz | 2016-09-02 13:06:13 | Re: Duplicate data despite unique constraint | 
| Previous Message | Jonas Tehler | 2016-09-02 11:32:56 | Duplicate data despite unique constraint |