From: | Berend Tober <btober(at)computer(dot)org> |
---|---|
To: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Deletion Challenge |
Date: | 2015-12-05 16:08:05 |
Message-ID: | 56630BE5.3080001@computer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
/*
Deletion Challenge
I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.
I have devised a way to accomplish this, but it is a 'two-stage'
approach: that is, it requires two delete statements. I would like to
know if there is a way to do it in a single statement.
Bonus challenge: Same question, except preserving the most recent N, for
N > 1, rows for each person so that a short history is retained after
the deletion.
I have included below an annotated test case and my current solution for
the N = 1 case.
*/
DROP TABLE IF EXISTS cash_journal;
CREATE TABLE cash_journal (
click bigint NOT NULL,
cash_journal_id bigint NOT NULL,
fairian_id bigint NOT NULL,
debit double precision,
credit double precision,
balance real DEFAULT 0,
description text
);
COMMENT ON COLUMN cash_journal.click IS 'Time of transaction.';
COMMENT ON COLUMN cash_journal.cash_journal_id IS 'Sequence of transaction within current click.';
COMMENT ON COLUMN cash_journal.fairian_id IS 'Fairian account effected.';
COMMENT ON COLUMN cash_journal.debit IS 'Account balance increase amount.';
COMMENT ON COLUMN cash_journal.credit IS 'Account balance decrease amount.';
COMMENT ON COLUMN cash_journal.balance IS 'Account balance, per Fairian running total.';
COMMENT ON COLUMN cash_journal.description IS 'Transaction description.';
/*
Below is some sample data, listed in the click/sequence order that the
data would actually be entered. That is, the 'click' column represents
advancing time, and within each click, transactions are sequenced by the
'cash_journal_id' column. Note there are some missing cash_journal_id
sequence numbers. This is an artifact of having presented here only
an illustrative sample. Generally, within each click, the sequence
would start at one and increment uniformly by one for each new row
in the same click, and then reset to one for the next click. The
missing increments in the sample data should not make any difference
in the solution.
The 'balance' column is a per-player running total, which is a
deliberate denormalization. It is calculated in a before insert trigger
by starting with the per-player previous balance, and then adding
the new row debit, if any, and subtracting the new row credit, if any.
Note, not all Fairians will have a transaction in every click, but any
number of Fairians may have multiple transactions in any click.
*/
copy cash_journal (click,cash_journal_id,fairian_id,debit,credit,balance,description) from stdin;
36 3 7 0 0 0 Initial cash balance
36 4 8 0 0 0 Initial cash balance
36 5 9 0 0 0 Initial cash balance
36 14 18 0 0 0 initial cash balance
37 5 7 9 \N 9 Ratified contract fa35e192121eab
37 7 8 8 \N 8 Ratified contract f1abd670358e03
37 9 9 7 \N 7 Ratified contract 1574bddb75c78a
411 1 25 0 0 0 Initial cash balance
411 2 25 1000 \N 1000 Issued bond 7719a1c782a1ba
412 1 7 5 \N 14 Sold food quantity 7 units.
412 2 25 \N 5 995 Bought food quantity 7 units.
413 1 25 \N 995 0 Redeemed bond 7719a1c782a1ba
\.
SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
/*
The sample starting data is shown here in order by Fairian so that it is
perhaps easier to see what is happening for each player. Note that the
result of the deletion should be the last row for each player.
click | cash_journal_id | fairian_id | debit | credit | balance | description
-------+-----------------+------------+-------+--------+---------+----------------------------------
36 | 3 | 7 | 0 | 0 | 0 | Initial cash balance
37 | 5 | 7 | 9 | | 9 | Ratified contract fa35e192121eab
412 | 1 | 7 | 5 | | 14 | Sold food quantity 7 units.
36 | 4 | 8 | 0 | 0 | 0 | Initial cash balance
37 | 7 | 8 | 8 | | 8 | Ratified contract f1abd670358e03
36 | 5 | 9 | 0 | 0 | 0 | Initial cash balance
37 | 9 | 9 | 7 | | 7 | Ratified contract 1574bddb75c78a
36 | 14 | 18 | 0 | 0 | 0 | initial cash balance
411 | 1 | 25 | 0 | 0 | 0 | Initial cash balance
411 | 2 | 25 | 1000 | | 1000 | Issued bond 7719a1c782a1ba
412 | 2 | 25 | | 5 | 995 | Bought food quantity 7 units.
413 | 1 | 25 | | 995 | 0 | Redeemed bond 7719a1c782a1ba
(12 rows)
*/
/*
Here is the current, two-stage solution in use. Is there a way to do it
with a single statement?
Can you create a solution that retains an arbitrarily specified number
of rows per player?
*/
BEGIN;
WITH max_click AS (
SELECT
cash_journal.fairian_id,
max(cash_journal.click) AS click
FROM cash_journal
GROUP BY cash_journal.fairian_id
)
delete from cash_journal j
using max_click b
where j.fairian_id = b.fairian_id
and j.click < b.click;
WITH max_journal_id AS (
SELECT
cash_journal.fairian_id,
cash_journal.click,
max(cash_journal.cash_journal_id) AS cash_journal_id
FROM cash_journal
GROUP BY cash_journal.fairian_id, cash_journal.click
)
delete from cash_journal j
using max_journal_id b
where j.fairian_id = b.fairian_id
and j.click = b.click
and j.cash_journal_id < b.cash_journal_id;
COMMIT;
SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
/*
click | cash_journal_id | fairian_id | debit | credit | balance | description
-------+-----------------+------------+-------+--------+---------+----------------------------------
412 | 1 | 7 | 5 | | 14 | Sold food quantity 7 units.
37 | 7 | 8 | 8 | | 8 | Ratified contract f1abd670358e03
37 | 9 | 9 | 7 | | 7 | Ratified contract 1574bddb75c78a
36 | 14 | 18 | 0 | 0 | 0 | initial cash balance
413 | 1 | 25 | | 995 | 0 | Redeemed bond 7719a1c782a1ba
(5 rows)
*/
From | Date | Subject | |
---|---|---|---|
Next Message | Sylvain Marechal | 2015-12-06 12:23:17 | Re: bdr manual cleanup required |
Previous Message | Selim Tuvi | 2015-12-04 19:40:43 | bdr manual cleanup required |