Re: Deletion Challenge

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Berend Tober <btober(at)computer(dot)org>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deletion Challenge
Date: 2015-12-09 01:02:11
Message-ID: CAEfWYyxdAMfqpSXrnzHHXvSVmDtGFcFxTnhgT9=HdHDm+0J-Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

If I understand correctly the value of "click" always advances and within a
"click" the "cash_journal_id" always advances - not necessarily by single
steps so within a fairian_id, ordering by "click" plus "cash_journal_id"
would return the records in order from which you want the most recent 5 for
each farian_id.

Typing without testing and ignoring performance optimizations, something
along the lines of the following should work and covers the "last 5" issue
as well.

with stuff_to_delete as (
select farian_id, click, cash_journal_id,
rank() over (partition by farian_id order by (click, cash_journal_id) desc)
as howold)
from cash_journal)
delete from cash_journal
using stuff_to_delete
where
cash_journal.farian_id = stuff_to_delete.farian_id
and cash_journal.click = stuff_to_delete.click
and cash_journal.cash_journal_id = stuff_to_delete.cash_journal_id
and stuff_to_delete.howold > 5;

Cheers,
Steve

On Sat, Dec 5, 2015 at 8:08 AM, Berend Tober <btober(at)computer(dot)org> wrote:

> /*
>
> Deletion Challenge
>
> I want to delete all but the most recent transaction, per person, from a
> table that records a transaction history because at some point the
> transaction history grows large enough to adversely effect performance,
> and also becomes less relevant for retention.
>
> I have devised a way to accomplish this, but it is a 'two-stage'
> approach: that is, it requires two delete statements. I would like to
> know if there is a way to do it in a single statement.
>
> Bonus challenge: Same question, except preserving the most recent N, for
> N > 1, rows for each person so that a short history is retained after
> the deletion.
>
> I have included below an annotated test case and my current solution for
> the N = 1 case.
>
> */
>
> DROP TABLE IF EXISTS cash_journal;
>
>
> CREATE TABLE cash_journal (
> click bigint NOT NULL,
> cash_journal_id bigint NOT NULL,
> fairian_id bigint NOT NULL,
> debit double precision,
> credit double precision,
> balance real DEFAULT 0,
> description text
> );
>
> COMMENT ON COLUMN cash_journal.click IS 'Time of transaction.';
> COMMENT ON COLUMN cash_journal.cash_journal_id IS 'Sequence of
> transaction within current click.';
> COMMENT ON COLUMN cash_journal.fairian_id IS 'Fairian account
> effected.';
> COMMENT ON COLUMN cash_journal.debit IS 'Account balance
> increase amount.';
> COMMENT ON COLUMN cash_journal.credit IS 'Account balance
> decrease amount.';
> COMMENT ON COLUMN cash_journal.balance IS 'Account balance, per
> Fairian running total.';
> COMMENT ON COLUMN cash_journal.description IS 'Transaction
> description.';
>
> /*
>
> Below is some sample data, listed in the click/sequence order that the
> data would actually be entered. That is, the 'click' column represents
> advancing time, and within each click, transactions are sequenced by the
> 'cash_journal_id' column. Note there are some missing cash_journal_id
> sequence numbers. This is an artifact of having presented here only
> an illustrative sample. Generally, within each click, the sequence
> would start at one and increment uniformly by one for each new row
> in the same click, and then reset to one for the next click. The
> missing increments in the sample data should not make any difference
> in the solution.
>
> The 'balance' column is a per-player running total, which is a
> deliberate denormalization. It is calculated in a before insert trigger
> by starting with the per-player previous balance, and then adding
> the new row debit, if any, and subtracting the new row credit, if any.
>
> Note, not all Fairians will have a transaction in every click, but any
> number of Fairians may have multiple transactions in any click.
>
> */
>
> copy cash_journal
> (click,cash_journal_id,fairian_id,debit,credit,balance,description) from
> stdin;
> 36 3 7 0 0 0 Initial cash balance
> 36 4 8 0 0 0 Initial cash balance
> 36 5 9 0 0 0 Initial cash balance
> 36 14 18 0 0 0 initial cash balance
> 37 5 7 9 \N 9 Ratified contract
> fa35e192121eab
> 37 7 8 8 \N 8 Ratified contract
> f1abd670358e03
> 37 9 9 7 \N 7 Ratified contract
> 1574bddb75c78a
> 411 1 25 0 0 0 Initial cash balance
> 411 2 25 1000 \N 1000 Issued bond 7719a1c782a1ba
> 412 1 7 5 \N 14 Sold food quantity 7 units.
> 412 2 25 \N 5 995 Bought food quantity 7
> units.
> 413 1 25 \N 995 0 Redeemed bond
> 7719a1c782a1ba
> \.
>
>
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
>
> /*
>
> The sample starting data is shown here in order by Fairian so that it is
> perhaps easier to see what is happening for each player. Note that the
> result of the deletion should be the last row for each player.
>
> click | cash_journal_id | fairian_id | debit | credit | balance |
> description
>
> -------+-----------------+------------+-------+--------+---------+----------------------------------
> 36 | 3 | 7 | 0 | 0 | 0 | Initial
> cash balance
> 37 | 5 | 7 | 9 | | 9 |
> Ratified contract fa35e192121eab
> 412 | 1 | 7 | 5 | | 14 | Sold
> food quantity 7 units.
> 36 | 4 | 8 | 0 | 0 | 0 | Initial
> cash balance
> 37 | 7 | 8 | 8 | | 8 |
> Ratified contract f1abd670358e03
> 36 | 5 | 9 | 0 | 0 | 0 | Initial
> cash balance
> 37 | 9 | 9 | 7 | | 7 |
> Ratified contract 1574bddb75c78a
> 36 | 14 | 18 | 0 | 0 | 0 | initial
> cash balance
> 411 | 1 | 25 | 0 | 0 | 0 | Initial
> cash balance
> 411 | 2 | 25 | 1000 | | 1000 | Issued
> bond 7719a1c782a1ba
> 412 | 2 | 25 | | 5 | 995 | Bought
> food quantity 7 units.
> 413 | 1 | 25 | | 995 | 0 |
> Redeemed bond 7719a1c782a1ba
> (12 rows)
>
> */
>
>
> /*
>
> Here is the current, two-stage solution in use. Is there a way to do it
> with a single statement?
>
> Can you create a solution that retains an arbitrarily specified number
> of rows per player?
>
> */
> BEGIN;
>
> WITH max_click AS (
> SELECT
> cash_journal.fairian_id,
> max(cash_journal.click) AS click
> FROM cash_journal
> GROUP BY cash_journal.fairian_id
> )
> delete from cash_journal j
> using max_click b
> where j.fairian_id = b.fairian_id
> and j.click < b.click;
>
> WITH max_journal_id AS (
> SELECT
> cash_journal.fairian_id,
> cash_journal.click,
> max(cash_journal.cash_journal_id) AS cash_journal_id
> FROM cash_journal
> GROUP BY cash_journal.fairian_id, cash_journal.click
> )
> delete from cash_journal j
> using max_journal_id b
> where j.fairian_id = b.fairian_id
> and j.click = b.click
> and j.cash_journal_id < b.cash_journal_id;
>
> COMMIT;
>
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
>
> /*
>
> click | cash_journal_id | fairian_id | debit | credit | balance |
> description
>
> -------+-----------------+------------+-------+--------+---------+----------------------------------
> 412 | 1 | 7 | 5 | | 14 | Sold
> food quantity 7 units.
> 37 | 7 | 8 | 8 | | 8 |
> Ratified contract f1abd670358e03
> 37 | 9 | 9 | 7 | | 7 |
> Ratified contract 1574bddb75c78a
> 36 | 14 | 18 | 0 | 0 | 0 | initial
> cash balance
> 413 | 1 | 25 | | 995 | 0 |
> Redeemed bond 7719a1c782a1ba
> (5 rows)
>
>
> */
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-12-09 01:23:43 Re: Deletion Challenge
Previous Message FattahRozzaq 2015-12-09 00:55:09 HELP!!! The WAL Archive is taking up all space