Re: Deletion Challenge

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Berend Tober <btober(at)computer(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deletion Challenge
Date: 2015-12-09 20:43:11
Message-ID: CAKFQuwbohNEjm+6A=HY8CV9sUKFOJ2LvuvnXTqKsnjCrrGrOZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 12/09/2015 12:24 AM, Berend Tober wrote:
>
>> Adrian Klaver wrote:
>>
>>> On 12/05/2015 08:08 AM, Berend Tober wrote:
>>>
>>>> /*
>>>>
>>>> Deletion Challenge
>>>>
>>>> I want to delete all but the most recent transaction, per person, from a
>>>> table that records a transaction history because at some point the
>>>> transaction history grows large enough to adversely effect performance,
>>>> and also becomes less relevant for retention.
>>>>
>>>> ...
>>>>
>>>>
>>> test=> delete from cash_journal where ARRAY[click, cash_journal_id]
>>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
>>> group by fairian_id);
>>> DELETE 7
>>>
>>> test=> SELECT * FROM cash_journal order by fairian_id, click,
>>> cash_journal_id;
>>> click | cash_journal_id | fairian_id | debit | credit | balance
>>> | description
>>>
>>> -------+-----------------+------------+-------+--------+---------+----------------------------------
>>>
>>> 412 | 1 | 7 | 5 | | 14 |
>>> Sold food quantity 7 units.
>>> 37 | 7 | 8 | 8 | | 8 |
>>> Ratified contract f1abd670358e03
>>> 37 | 9 | 9 | 7 | | 7 |
>>> Ratified contract 1574bddb75c78a
>>> 36 | 14 | 18 | 0 | 0 | 0 |
>>> initial cash balance
>>> 413 | 1 | 25 | | 995 | 0 |
>>> Redeemed bond 7719a1c782a1ba
>>> (5 rows)
>>>
>>>
>> Nice.
>>
>> The idea of a NOT IN query had occurred to me briefly, but I failed to
>> pursue it because at some point in the distant past I had gained the
>> impression that NOT IN queries were not computationally efficient.
>> During one round of testing I had like a million rows. I'll have to run
>> some EXPLAIN query testing with a larger data sample for comparison.
>> Thanks!
>>
>
> Plan B:
>
> WITH d AS
> (SELECT * FROM
> cash_journal
> LEFT JOIN
> (SELECT
> MAX(ARRAY[click,cash_journal_id]) AS mx
> FROM
> cash_journal
> GROUP BY
> fairian_id)
> AS
> mxa
> ON
> mxa.mx=ARRAY[click, cash_journal_id]
> WHERE
> mx IS NULL)
> DELETE FROM
> cash_journal
> USING
> d
> WHERE
> d.click = cash_journal.click
> AND
> d.cash_journal_id = cash_journal.cash_journal_id;
>
>
​Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT
DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY
click DESC, cash_journal_id" or something similar? It doesn't seem like
you should need to introduce an array and an aggregate here.

​It does have the negative property of only providing a single row; which
excludes using it for the "last 5" part but I suspect it will be
considerably faster for the single version.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Crawford 2015-12-09 21:45:52 Re: Deletion Challenge
Previous Message Adrian Klaver 2015-12-09 20:31:35 Re: Deletion Challenge