Re: Deletion Challenge

From: Berend Tober <btober(at)computer(dot)org>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deletion Challenge
Date: 2015-12-09 08:13:10
Message-ID: 5667E296.6050005@computer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Steve Crawford wrote:
> If I understand correctly the value of "click" always advances and within a "click" the
> "cash_journal_id" always advances - not necessarily by single steps so within a fairian_id, ordering
> by "click" plus "cash_journal_id" would return the records in order from which you want the most
> recent 5 for each farian_id.
>
> Typing without testing and ignoring performance optimizations, something along the lines of the
> following should work and covers the "last 5" issue as well.
>
> with stuff_to_delete as (
> select farian_id, click, cash_journal_id,
> rank() over (partition by farian_id order by (click, cash_journal_id) desc) as howold)
> from cash_journal)
> delete from cash_journal
> using stuff_to_delete
> where
> cash_journal.farian_id = stuff_to_delete.farian_id
> and cash_journal.click = stuff_to_delete.click
> and cash_journal.cash_journal_id = stuff_to_delete.cash_journal_id
> and stuff_to_delete.howold > 5;
>

Assessing without testing, I like that. Thanks!

Although the above is not the exactly the form I was using, an earlier iteration of a related
problem employed window functions. But as the data set grew performance suffered, so if deletes were
not done on a regular, continuing basis in order to keep the historical data set approximately
"small", the process execution time using a windowing scheme eventually exceeded the extent of my
patience.

That "non-scalable" situation is actually what motivated the deliberate de-normalization (of
retaining the "running balance" in a separate column) and the desire to delete old data. The
original implementation calculated the running balance on-the-fly, employing windowing per
fairian_id, and those tallies of the net balance entailed increasingly lengthy execution times as
the number of rows increased, hence I was motivated to retain only a relatively constant-sized
per-farian history, and I dismissed the use of windowing for the delete problem since it was so
problematic for the running-balance-without-delete problem.

Thanks for knocking some sense into me!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Berend Tober 2015-12-09 08:24:11 Re: Deletion Challenge
Previous Message Sylvain MARECHAL 2015-12-09 08:09:24 Re: bdr manual cleanup required