Re: Deleting more efficiently from large partitions

From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Deleting more efficiently from large partitions
Date: 2020-06-16 13:57:10
Message-ID: CAEtnbpWYqwSSmbZHPe5OKfx2zqEmTypv_4Y=QrFHdPKPfMKiOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, Jun 15, 2020 at 7:39 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com> wrote:

> Hi all. I have a partitioned table (by month from a date column), where
> each partition contains something like 400m rows.
>
> Each partition is defined by a PK with a uuid and date field (the parent
> table is partitioned by range on the date), and two other columns.
>
> In doing a delete for a specific date, e.g. DELETE FROM t WHERE date =
> '2019-09-01' AND uuid IN (SELECT uuid FROM temptable), it runs very
> efficiently.
>
> I am trying to write a processing script that deletes for potentially
> multiple dates & uuid values, and it just takes hours, trying:
>
> DELETE FROM t WHERE date = (SELECT DISTINCT date from temp) AND uuid IN
> (select uuid from tempuuds) -- no go, hours.
>
> Tried USING, e.g. DELETE FROM t USING temp WHERE t.date = temp.date AND
> t.uuid = temp.uuid -- no go, hours.
>
> I just can't delete from this table without an explicit date and a set of
> uuids using a WHERE IN approach, but I need to.I was thinking of making a
> plpgsql function or something that loops through dates and makes a more
> explicit DELETE statement, but I'm thinking there must be some better way
> using indexing or something.
>
> Appreciate any tips.
>

Have you considered partitioning by day instead of month? Could eliminate
an index you may have on the date column.
How many days are in the many-days DELETE? Could you simply wrap it in a
transaction and do one DELETE per day?

You could potentially get better performance removing the JOIN/sub-SELECT
using
DELETE FROM mytable WHERE date_col = ANY( ARRAY['2020-01-01',
'2020-01-13']::date[] );

HTH
-Greg

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Pepe TD Vo 2020-06-16 14:20:56 create batch script to import into postgres tables
Previous Message Greg Spiegelberg 2020-06-16 11:48:10 Re: Upgrade streaming replication and log-shipping standby servers