Re: Deletion Challenge

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Berend Tober <btober(at)computer(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deletion Challenge
Date: 2015-12-09 01:23:43
Message-ID: 5667829F.1010600@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/05/2015 08:08 AM, Berend Tober wrote:
> /*
>
> Deletion Challenge
>
> I want to delete all but the most recent transaction, per person, from a
> table that records a transaction history because at some point the
> transaction history grows large enough to adversely effect performance,
> and also becomes less relevant for retention.
>
> I have devised a way to accomplish this, but it is a 'two-stage'
> approach: that is, it requires two delete statements. I would like to
> know if there is a way to do it in a single statement.
>
> Bonus challenge: Same question, except preserving the most recent N, for
> N > 1, rows for each person so that a short history is retained after
> the deletion.
>
> I have included below an annotated test case and my current solution for
> the N = 1 case.
>
> */
>
> DROP TABLE IF EXISTS cash_journal;
>
>
> CREATE TABLE cash_journal (
> click bigint NOT NULL,
> cash_journal_id bigint NOT NULL,
> fairian_id bigint NOT NULL,
> debit double precision,
> credit double precision,
> balance real DEFAULT 0,
> description text
> );
>
> COMMENT ON COLUMN cash_journal.click IS 'Time of transaction.';
> COMMENT ON COLUMN cash_journal.cash_journal_id IS 'Sequence of
> transaction within current click.';
> COMMENT ON COLUMN cash_journal.fairian_id IS 'Fairian account
> effected.';
> COMMENT ON COLUMN cash_journal.debit IS 'Account balance increase
> amount.';
> COMMENT ON COLUMN cash_journal.credit IS 'Account balance
> decrease amount.';
> COMMENT ON COLUMN cash_journal.balance IS 'Account balance, per
> Fairian running total.';
> COMMENT ON COLUMN cash_journal.description IS 'Transaction
> description.';
>
> /*
>
> Below is some sample data, listed in the click/sequence order that the
> data would actually be entered. That is, the 'click' column represents
> advancing time, and within each click, transactions are sequenced by the
> 'cash_journal_id' column. Note there are some missing cash_journal_id
> sequence numbers. This is an artifact of having presented here only
> an illustrative sample. Generally, within each click, the sequence
> would start at one and increment uniformly by one for each new row
> in the same click, and then reset to one for the next click. The
> missing increments in the sample data should not make any difference
> in the solution.
>
> The 'balance' column is a per-player running total, which is a
> deliberate denormalization. It is calculated in a before insert trigger
> by starting with the per-player previous balance, and then adding
> the new row debit, if any, and subtracting the new row credit, if any.
>
> Note, not all Fairians will have a transaction in every click, but any
> number of Fairians may have multiple transactions in any click.
>
> */
>
> copy cash_journal
> (click,cash_journal_id,fairian_id,debit,credit,balance,description) from
> stdin;
> 36 3 7 0 0 0 Initial cash balance
> 36 4 8 0 0 0 Initial cash balance
> 36 5 9 0 0 0 Initial cash balance
> 36 14 18 0 0 0 initial cash balance
> 37 5 7 9 \N 9 Ratified contract fa35e192121eab
> 37 7 8 8 \N 8 Ratified contract f1abd670358e03
> 37 9 9 7 \N 7 Ratified contract 1574bddb75c78a
> 411 1 25 0 0 0 Initial cash balance
> 411 2 25 1000 \N 1000 Issued bond 7719a1c782a1ba
> 412 1 7 5 \N 14 Sold food quantity 7 units.
> 412 2 25 \N 5 995 Bought food quantity 7 units.
> 413 1 25 \N 995 0 Redeemed bond 7719a1c782a1ba
> \.
>
>
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
>
> /*
>
> The sample starting data is shown here in order by Fairian so that it is
> perhaps easier to see what is happening for each player. Note that the
> result of the deletion should be the last row for each player.
>
> click | cash_journal_id | fairian_id | debit | credit | balance
> | description
> -------+-----------------+------------+-------+--------+---------+----------------------------------
>
> 36 | 3 | 7 | 0 | 0 | 0 |
> Initial cash balance
> 37 | 5 | 7 | 9 | | 9 |
> Ratified contract fa35e192121eab
> 412 | 1 | 7 | 5 | | 14 | Sold
> food quantity 7 units.
> 36 | 4 | 8 | 0 | 0 | 0 |
> Initial cash balance
> 37 | 7 | 8 | 8 | | 8 |
> Ratified contract f1abd670358e03
> 36 | 5 | 9 | 0 | 0 | 0 |
> Initial cash balance
> 37 | 9 | 9 | 7 | | 7 |
> Ratified contract 1574bddb75c78a
> 36 | 14 | 18 | 0 | 0 | 0 |
> initial cash balance
> 411 | 1 | 25 | 0 | 0 | 0 |
> Initial cash balance
> 411 | 2 | 25 | 1000 | | 1000 |
> Issued bond 7719a1c782a1ba
> 412 | 2 | 25 | | 5 | 995 |
> Bought food quantity 7 units.
> 413 | 1 | 25 | | 995 | 0 |
> Redeemed bond 7719a1c782a1ba
> (12 rows)
>
> */
>
>
> /*
>
> Here is the current, two-stage solution in use. Is there a way to do it
> with a single statement?
>
> Can you create a solution that retains an arbitrarily specified number
> of rows per player?
>
> */
> BEGIN;
>
> WITH max_click AS (
> SELECT
> cash_journal.fairian_id,
> max(cash_journal.click) AS click
> FROM cash_journal
> GROUP BY cash_journal.fairian_id
> )
> delete from cash_journal j
> using max_click b
> where j.fairian_id = b.fairian_id
> and j.click < b.click;
>
> WITH max_journal_id AS (
> SELECT
> cash_journal.fairian_id,
> cash_journal.click,
> max(cash_journal.cash_journal_id) AS cash_journal_id
> FROM cash_journal
> GROUP BY cash_journal.fairian_id, cash_journal.click
> )
> delete from cash_journal j
> using max_journal_id b
> where j.fairian_id = b.fairian_id
> and j.click = b.click
> and j.cash_journal_id < b.cash_journal_id;
>
> COMMIT;
>
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
>
> /*
>
> click | cash_journal_id | fairian_id | debit | credit | balance
> | description
> -------+-----------------+------------+-------+--------+---------+----------------------------------
>
> 412 | 1 | 7 | 5 | | 14 | Sold
> food quantity 7 units.
> 37 | 7 | 8 | 8 | | 8 |
> Ratified contract f1abd670358e03
> 37 | 9 | 9 | 7 | | 7 |
> Ratified contract 1574bddb75c78a
> 36 | 14 | 18 | 0 | 0 | 0 |
> initial cash balance
> 413 | 1 | 25 | | 995 | 0 |
> Redeemed bond 7719a1c782a1ba
> (5 rows)
>
>
> */
>
>

test=> delete from cash_journal where ARRAY[click, cash_journal_id] NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal group by fairian_id);
DELETE 7

test=> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
click | cash_journal_id | fairian_id | debit | credit | balance | description
-------+-----------------+------------+-------+--------+---------+----------------------------------
412 | 1 | 7 | 5 | | 14 | Sold food quantity 7 units.
37 | 7 | 8 | 8 | | 8 | Ratified contract f1abd670358e03
37 | 9 | 9 | 7 | | 7 | Ratified contract 1574bddb75c78a
36 | 14 | 18 | 0 | 0 | 0 | initial cash balance
413 | 1 | 25 | | 995 | 0 | Redeemed bond 7719a1c782a1ba
(5 rows)

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2015-12-09 04:18:00 Re: bdr manual cleanup required
Previous Message Steve Crawford 2015-12-09 01:02:11 Re: Deletion Challenge