From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
Cc: | Berend Tober <btober(at)computer(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Deletion Challenge |
Date: | 2015-12-09 20:43:11 |
Message-ID: | CAKFQuwbohNEjm+6A=HY8CV9sUKFOJ2LvuvnXTqKsnjCrrGrOZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:
> On 12/09/2015 12:24 AM, Berend Tober wrote:
>
>> Adrian Klaver wrote:
>>
>>> On 12/05/2015 08:08 AM, Berend Tober wrote:
>>>
>>>> /*
>>>>
>>>> Deletion Challenge
>>>>
>>>> I want to delete all but the most recent transaction, per person, from a
>>>> table that records a transaction history because at some point the
>>>> transaction history grows large enough to adversely effect performance,
>>>> and also becomes less relevant for retention.
>>>>
>>>> ...
>>>>
>>>>
>>> test=> delete from cash_journal where ARRAY[click, cash_journal_id]
>>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
>>> group by fairian_id);
>>> DELETE 7
>>>
>>> test=> SELECT * FROM cash_journal order by fairian_id, click,
>>> cash_journal_id;
>>> click | cash_journal_id | fairian_id | debit | credit | balance
>>> | description
>>>
>>> -------+-----------------+------------+-------+--------+---------+----------------------------------
>>>
>>> 412 | 1 | 7 | 5 | | 14 |
>>> Sold food quantity 7 units.
>>> 37 | 7 | 8 | 8 | | 8 |
>>> Ratified contract f1abd670358e03
>>> 37 | 9 | 9 | 7 | | 7 |
>>> Ratified contract 1574bddb75c78a
>>> 36 | 14 | 18 | 0 | 0 | 0 |
>>> initial cash balance
>>> 413 | 1 | 25 | | 995 | 0 |
>>> Redeemed bond 7719a1c782a1ba
>>> (5 rows)
>>>
>>>
>> Nice.
>>
>> The idea of a NOT IN query had occurred to me briefly, but I failed to
>> pursue it because at some point in the distant past I had gained the
>> impression that NOT IN queries were not computationally efficient.
>> During one round of testing I had like a million rows. I'll have to run
>> some EXPLAIN query testing with a larger data sample for comparison.
>> Thanks!
>>
>
> Plan B:
>
> WITH d AS
> (SELECT * FROM
> cash_journal
> LEFT JOIN
> (SELECT
> MAX(ARRAY[click,cash_journal_id]) AS mx
> FROM
> cash_journal
> GROUP BY
> fairian_id)
> AS
> mxa
> ON
> mxa.mx=ARRAY[click, cash_journal_id]
> WHERE
> mx IS NULL)
> DELETE FROM
> cash_journal
> USING
> d
> WHERE
> d.click = cash_journal.click
> AND
> d.cash_journal_id = cash_journal.cash_journal_id;
>
>
Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT
DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY
click DESC, cash_journal_id" or something similar? It doesn't seem like
you should need to introduce an array and an aggregate here.
It does have the negative property of only providing a single row; which
excludes using it for the "last 5" part but I suspect it will be
considerably faster for the single version.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Steve Crawford | 2015-12-09 21:45:52 | Re: Deletion Challenge |
Previous Message | Adrian Klaver | 2015-12-09 20:31:35 | Re: Deletion Challenge |