From: | Gambhir Singh <gambhir(dot)singh05(at)gmail(dot)com> |
---|---|
To: | Rui DeSousa <rui(dot)desousa(at)icloud(dot)com> |
Cc: | Alex Balashov <abalashov(at)evaristesys(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Need an idea to operate massive delete operation on big size table. |
Date: | 2025-01-16 18:09:36 |
Message-ID: | CAHOGQfVy2f3VQS1un=33nV72a-sARidKF0QFwzZhhLGMZrcGaQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Thanks everyone for providing the ideas and special thanks to Rui DeSousa.
On Thu, 16 Jan 2025 at 05:39, Rui DeSousa <rui(dot)desousa(at)icloud(dot)com> wrote:
>
>
> On Jan 15, 2025, at 4:24 PM, Alex Balashov <abalashov(at)evaristesys(dot)com>
> wrote:
>
> n my experience, mass deletions are tough. There may be a supporting index
> to assist the broadest criteria, but the filtered rows that result must
> still be sequentially scanned for non-indexed sub-criteria[1]. That can
> still be an awful lot of rows and a huge, time-consuming workload.
>
>
>
>
> This is how easliy do it. I’ve used this method many times over. Create a
> purge queue table and a function to purge the records in batches.
>
> 1. I normally try to keep the purge transaction to 500ms or less. i.e.
> purge 1000 or 5000 records, etc. if depends on the schema and cascading
> deletes, etc.
> 2. The function is setup to run in parallel; so kick off as many purge
> sessions as your system can handle.
>
> Write a perl or shell script to just keep executing “select purge_date();”
> repeatedly until zero is returned. Make sure to only call the function
> once in a given transaction.
>
> You shouldn’t have to create any additional indexes or disable them while
> purging. The application can continue to operate without issues and when
> the process is complete; vacuum and rebuild indexes on the effected tables.
>
>
>
> create table purge_data_queue (
> data_id bigint not null primary key
> );
>
> /* Identify all records to be purged */
> insert into purge_data_queue (data_id)
> select data_id
> from data
> where <records to delete>
> ;
>
> create or replace function purge_data (_limit int default 1000)
> returns int
> as
> $$
> declare
> _rowcnt int;
> begin
> create temp table if not exists purge_set (
> data_id bigint not null
> , primary key (data_id)
> ) on commit drop
> ;
>
> /* Identify batch to be purged */
> insert into purge_set (
> data_id
> )
> select data_id
> from purge_data_queue
> limit _limit
> for update skip locked
> ;
>
> /* Delete batch from base table */
> delete from data using purge_set
> where data.data_id = purge_set.data_id
> ;
>
> get diagnostics _rowcnt = ROW_COUNT;
>
> /* Delete batch from queue table */
> delete from purge_data_queue using purge_set
> where purge_data_queue.data_id = purge_set.data_id
> ;
> return _rowcnt;
> end;
> $$ language plpgsql
> set search_path = schema_name
> ;
>
--
Thanks & Regards
Gambhir Singh
From | Date | Subject | |
---|---|---|---|
Next Message | Siraj G | 2025-01-17 06:15:22 | vacuuming taking long time for pg_attribute |
Previous Message | Mauricio Fernandez | 2025-01-16 11:19:36 | pg_combinebackup and multiple tablespaces |