Re: Partitioning and unique key

From: veem v <veema0000(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and unique key
Date: 2024-09-01 03:06:19
Message-ID: CAB+=1TVZhPdiCtf0W1CUHSnHEH0ALG=aWszvnS56y+V1+de7Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, 1 Sept 2024 at 03:58, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

>
> The model is at odds with itself and untenable. If the tables hold
> multiple rows for a given transaction_id then you cannot have a
> PK/Unique constraint on that column. Seems there is a decided lack of
> any planning. The only way I can see this happening is consolidating all
> the duplicate transaction_id rows into a single row for each
> transaction_id. That then leads to the question of how to do that and
> retain the 'correct' information from the selection of rows for each
> transaction_id.
>
>
Yes we had messed up the data for now and have multiple records for each
transaction_id persisted and thus we need to fix the data. But more than
that , as I stated , I wanted to understand first
1) if it's technically possible to have a unique key on only the
transaction_id column having the partition key on the
transaction_timestamp, because the table is going to be queried/purged
based on the transaction_timestamp?

2) Additionally we were thinking if above is technically not possible, then
the maximum granularity which we can have for each transaction_id will be a
day, so the partition key transaction_timestmp can be truncated to have
only date component but no time component. So the primary key will
be (transaction_id, transaction_date). But we also don't want to lose the
time component and persist the existing data of transaction_timestmp (which
will have a time component in it, in a separate column).

And in above case , for fixing the existing data in least disruptive way,
as we have currently duplicate transaction_id inserted into the table
already because of the composite primary key(transaction_id,
transaction_timestmp).Can we simply
i)rename the existing column transaction_timestmp to transaction_date and
then add new column transaction_timestmp using the values of existing
column partition by partition.
ii)And then delete the duplicate data using query something as below , each
partition by partition.
iii)And then alter the datatype of the partition key transaction_date to
DATE in one shot at the table level(which should be fast as its having more
granularity as compare to existing timestamptype, so should be catalog or
dictionary change only), and that will remain the part of composite PK
(transaction_id,transaction_date).
iv) Repeat this step for all child partition tables and then for the parent
partition tables.

Will this technique be the most efficient way of fixing this mess?

WITH ranked_records AS (
SELECT column1_id, column2_timestamptz,
ROW_NUMBER() OVER (PARTITION BY column1_id,
date_trunc('day', column2_timestamptz)
ORDER BY column2_timestamptz DESC) AS
rn
FROM partition_name
)
DELETE FROM partition_name T1
WHERE EXISTS (
SELECT 1
FROM ranked_records T2
WHERE T1.column1_id = T2.column1_id
AND T1.column2_timestamptz = T2.column2_timestamptz
AND T2.rn > 1
)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2024-09-01 03:43:06 Re: Partitioning and unique key
Previous Message Justin Clift 2024-09-01 02:32:43 Re: Upgrade Ubuntu 22 -> 24 may break PostgreSQL