FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken

From: "Maeldron T(dot)" <maeldron(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Date: 2016-12-03 01:52:38
Message-ID: CAKatfSnA6bZPav0W2mOZ0HhriDL5fK0ycYPkT7XQo=TGcF6oOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I’ll be as short as I can as I have broken my arm and I’m not supposed to
type. This isn’t a joke.

However, I learned the following the hard way and I want to save you from
the same.

Upgrading FreeBSD from 10 to 11 might break your database. It probably
won’t be corrupted but it will be useless until you dump-import it, and you
might need to do manual fixing.

My app has more than 1000 automated tests, and the upgrade itself was
explicitly tested. The affected columns are tested many times. It was
tested on two different machines (including the 10 => 11) before done in
production. But the issue happens only at random on large scale. I could
not reproduce it with inserting a few rows. I could reproduce it with real
data.

I didn’t debug much as I did not sleep for two days until I fixed it (live
systems, with left hand).

I removed noise from queries, the real tables have dozens of not null
columns. The edited queries may have syntax errors, but they were copied
from real world output. So were the errors. I use 'xxx' and '123' to hide
private info but the strings are real in general. This matters as the issue
might be locale related.

I started seeing these in the logs:

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint
"index_users_unique_on_lower_email"
DETAIL: Key (lower(email::text))=(andy(dot)mxxx(at)xxx(dot)com
<andy(dot)moore230(at)gmail(dot)com>) already exists.
: UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594',
"visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint
"index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(joyce1234) already exists.
: UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589',
"visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1

The table:

# \d users
Column | Type |
Modifiers
-------------------------+--------------------------+----------------------------------------------------
id | integer | not null default
nextval('users_id_seq'::regclass)
username | character varying | not null
password | character varying | not null
email | character varying | not null
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
"index_users_unique_on_lower_username" UNIQUE, btree
(lower(username::text))

Testing:

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaa', 'Ági');
INSERT 0 1

test2=# select id, username from users where lower(username) = 'ági';
id | username
--------+----------
123 | Ági
(1 row)

test2=# select id, username from users where username = 'Ági';
id | username
--------+----------
123 | Ági
456 | Ági
(2 rows)

test2=# select id, username from users where username = 'Mustafa';

id | username
-------+----------
123 | Mustafa
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaab', 'Mustafa');
INSERT 0 1

test2=# select id, username from users where username = 'Mustafa';

id | username
--------+----------

123 | Mustafa
456 | Mustafa
(2 rows)

test2=# select id, username from users where username = 'bunny';
id | username
------+----------
123 | bunny
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
id | username
--------+----------
123 | bunny
456 | bunny
(2 rows)

test2=# select id, username from users where username = 'edwin';
id | username
-------+----------
123 | edwin
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaad', 'edwin');
INSERT 0 1

test2=# select id, username from users where username = 'edwin';
id | username
--------+----------
123 | edwin
456 | edwin
(2 rows)

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaae', 'edwin');
ERROR: duplicate key value violates unique constraint
"index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(edwin) already exists.

test2=# insert into users (password, email, username) values ('aaaaa',
'aaaaaae', 'bunny');
ERROR: duplicate key value violates unique constraint
"index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(bunny) already exists.

The issue didn’t happen only with strings that have characters like 'é',
'ő'. English names were also affected.

The application does have validation on the strings but I didn’t see the
existing versions for the same reason the insert didn’t see.

*Sometimes*

Also, the colliding (not colliding) strings usually had the same case of
characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)

Sometimes only the latter triggered the violation:

UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxx
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589',
"visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxx

Strange. I though PG created a new tuple in both cases.

How to reproduce it:

0. Don’t break your arm. It’s not needed.

1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version
9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the
21Th century. I has been for a while). The ICU version might be not
affected.

2. Create a database cluster with UTF-8 encoding. (Yes, the year is
2016). I use "en_US.UTF-8" for LC_*.

3. Create a table that has unique index or indices on lower(column)

4. Import much data in the table. Or I think inserting 100 000 random
English names will do fine.

5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the
PostgreSQL. But it’s the same version for the new platform. In my case it
was postgresql95-server-9.5.4_1

6. Try to insert existing values. You should succeed at random.

7. Try to query the duplicates with "where lower(column) = 'edwin'". Or
bunny. Or whatever. You will see only the new version. Even if the original
string was already lower case.

This is not only an index and duplication issue. Querying with lower() also
didn’t work with non duplicated records. At random.

In my case even non duplicated uses could not log in. They seemed to be
deleted. But not all of them. At random. My users thought they were removed
from the site. It seemed so.

I fixed it by dump and load. And as I already had downtime, I reloaded it
to 9.6. I had to manually rename the duplicated records to do so.

I already typed 10 times more than I should have, feel free to discover it
but not on your production database.

(Maybe database clusters should have a header that wouldn’t allow
incompatible server versions to process the existing data. I wonder if it
would take more than 8 bytes per server. But I guess it was not know to be
incompatible. Even my two CIs didn’t show it.)

M

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Maeldron T. 2016-12-03 02:03:05 Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Previous Message Kiriakos Georgiou 2016-12-02 22:10:24 Announcement: application level undo/redo for PostgreSQL