Duplicate Primary keys in postgresql tables

From: De Lan <delandl(at)google(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Cc: Amit Virmani <avirmani(at)google(dot)com>, Terence Yim <terencey(at)google(dot)com>, Vitalii Tymchyshyn <vitaliit(at)google(dot)com>, Samik Gupta <gupsam(at)google(dot)com>, Albert Shau <ashau(at)google(dot)com>
Subject: Duplicate Primary keys in postgresql tables
Date: 2023-07-22 17:11:37
Message-ID: CAPsFu7zsqZgtWuQi_sUPz91+6fCwJTEiXc4qUcHap9khW84S3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi pgsql community,

Recently we found in a postgresql 11.19.0 alpine table there are two rows
with duplicate primary keys.

*The table schema (anonymized):*

# \d+ app_specs;

Table "public.app_specs"

Column | Type | Collation | Nullable | Default |
Storage | Stats target | Description

-------------------------+---------+-----------+----------+---------+----------+--------------+-------------

ns | text | | not null | |
extended | |

app | text | | not null | |
extended | |

version | text | | not null | |
extended | |

app_data | text | | | |
extended | |

change_summary | text | | | |
extended | |

created | bigint | | | | plain
| |

latest | boolean | | | | plain
| |

Indexes:

"app_specs_pkey" PRIMARY KEY, btree (ns, app, version)

"pktest" UNIQUE, btree (ns, app, version) INVALID

"app_specs_created_idx" btree (created)

"app_specs_latest_idx" btree (latest)

*The rows with duplicate primary keys:*

# SELECT ns,app,version,created,latest FROM public.app_specs WHERE ns='
default' AND app='Quickstart_v2' AND version='-VERSION';

namespace | application | version | created | latest

-----------+-------------------------+-----------+---------------+--------

default | Quickstart_v2 | -VERSION | |

default | Quickstart_v2 | -VERSION | 1687965022134 | t

(2 rows)

It looks like the second row was originally in place. And during an
"upsert" ddl transaction, the first row was inserted instead of updating
the original row. We used the below statement for the mentioned upsert
operation.

INSERT INTO simpletable (key1,key2,col1,col2,col3) VALUES (?,?,?,?,?) ON
* CONFLICT (key1,key2) DO UPDATE SET
col1=EXCLUDED.col1,col2=EXCLUDED.col2,col3=EXCLUDED.col3;

Recreating the primary keys confirms that the constraint doesn't permit
these duplicate rows: "ERROR: duplicate key value violates unique
constraint "app_specs_pkey""

*Our investigation:*

1. We found there were some previous discussions about duplicate primary
keys caused by different collations like [Corruption with duplicate primary
key(
https://www.postgresql.org/message-id/CY4PR03MB269502786425690DA5B2EDBAA9370%40CY4PR03MB2695.namprd03.prod.outlook.com
)

2. We checked our system as well. This container of the table was upgraded
from a postgres:11.0.0 image to postgres:11.19.0-alpine. The former was
based on Debian and the latter is on Alpine. Debian is using: C.UTF-8,
en_US.utf8 for collation. The Alpine one uses the musl library version
(1.2.3). From https://stackoverflow.com/a/74808009/21798098 it also talks
about collation issues between Debian and Alpine.

*Our questions:*

Any ideas on what might cause this behavior other than the collation? if it
is a well-known issue in the pgsql community or it really is the
coalition that's the root cause, do we have mitigation for this kind of
issue from happening in future?

De,

Thanks!

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ken Marshall 2023-07-22 17:30:38 Re: Duplicate Primary keys in postgresql tables
Previous Message Les 2023-07-22 07:29:51 Re: invalid value for parameter "default_text_search_config": "public.pg"