Re: Best way to delete big amount of records from big table

From: Ekaterina Amez <ekaterina(dot)amez(at)zunibal(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Best way to delete big amount of records from big table
Date: 2020-03-27 14:56:47
Message-ID: CAFijohgmrH-7dxwr32r5XVJtGcV_HdRb87pELkHyAYSO5Zk29g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Michael,

El vie., 27 mar. 2020 a las 15:41, Michael Lewis (<mlewis(at)entrata(dot)com>)
escribió:

> If you can afford the time, I am not sure the reason for the question.
> Just run it and be done with it, yes?
>

I've been working with other RDBMS all of my life and I'm quite new to PG
world, and I'm learning to do things when I need to do them so I'm trying
to learn them in the right way :D
Also, for what I'm seeing in other projects, this is going to be a problem
in most of them (if it's not yet a problem), and it's going to be me the
one that solves it so again I'm in the path of learning to do this kind of
things in the right way.

>
> A couple of thoughts-
> 1) That is a big big transaction if you are doing all the cleanup in a
> single function call. Will this be a production system that is still online
> for this archiving? Having a plpgsql function that encapsulates the work
> seems fine, but I would limit the work to a month at a time or something
> and call the function repeatedly. Get the min month where records exist
> still, delete everything matching that, return. Rinse, repeat.
>

Ok, the function provided it's just a first approach. I was planning to add
parameters to make dates more flexible.

2) If you are deleting/moving most of the table (91 of 150 million),
> consider moving only the records you are keeping to a new table, renaming
> old table, and renaming new table back to original name. Then you can do
> what you want to shift the data in the old table and delete it.
>

I was aware of this solution but I've read it's not side effect free. As my
tables don't have any kind of FK-PK only the sequences for the serial
columns, would this be a safe way to do what I want?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2020-03-27 15:03:10 Re: Best way to delete big amount of records from big table
Previous Message Ekaterina Amez 2020-03-27 14:55:27 Re: Best way to delete big amount of records from big table