From: | pg noob <pgnube(at)gmail(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | pgsql-novice <pgsql-novice(at)postgresql(dot)org> |
Subject: | Re: best way to do bulk delete? |
Date: | 2011-10-28 16:47:48 |
Message-ID: | CAPNY-2VskCouya2aVOwQmT2BRhyd2VENe7ofz0Zt38Jjhnq2hg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Thank you. That approach looks promising.
When you say that it is not side effect free, are there any other side
effects other than the composite types one you mentioned?
I don't believe these tables contain any composite types so it shouldn't be
an issue.
I tested it and found that there is no ALTER TABLE <foo> RENAME to <bar>
CASCADE.
It gives a syntax error on the CASCADE keyword. I am using postgres 8.4,
maybe cascade is supported here in a newer version.
But I am able to work around that by renaming the table indexes and
constraints individually.
Where this approach seems to break down is when the table has a lot of
dependent objects (dependent constraints, functions, views, triggers, etc.).
I could use DROP CASCADE but then I would also need to save and recreate all
those dependent objects as well.
But for simple tables that don't have a lot of dependent objects your
suggested approach seems like it can work well.
On Fri, Oct 28, 2011 at 11:19 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Fri, Oct 28, 2011 at 9:30 AM, pg noob <pgnube(at)gmail(dot)com> wrote:
> >
> > Greetings,
> >
> > I have an occasional recurring use case where I need to delete a large
> > number of rows from multiple tables as quickly as possible,
> > something like a bulk-delete.
> >
> > A typical example would be deleting a few hundred thousand rows at once
> from
> > a set of tables each containing 1 to 2 million rows,
> > but in a worst case scenario it could be as large as 1 million rows from
> a
> > set of tables each containing 4 to 5 million rows.
> >
> > These tables of course have indexes on them as well as foreign key
> > constraints and cascade deletes to other tables.
> >
> > I can't simply truncate the tables because the rows being deleted are
> > subsets of the total amount of data in the tables.
> >
> > These tables have heavy insert/update/delete activity going on at the
> same
> > time but mostly not on the same set of rows that
> > is being bulk-deleted (though there may be some update activity going on
> > which accesses those rows and hasn't yet quiesced).
> >
> > What is the best way of going about this?
> >
> > I've considered a few options.
> >
> > One option is to open a single transaction, issue delete statements that
> > delete huge numbers of rows from each
> > table in question (probably with a DELETE USING SQL query) and then
> commit
> > the transaction.
> > My concern with this approach is that it will hold a huge number of row
> > locks while in progress and may take a long
> > time to complete and could introduce deadlocks if it competes with other
> > updates that have acquired row locks out of order.
> >
> > Another option would be to delete one row per transaction or a smaller
> set
> > of rows as part of a transaction and then
> > commit the transaction, repeating in a loop.
> > This has the advantage that if the transaction hits an error and has to
> be
> > rolled back it doesn't have to redo the entire
> > delete operation again, and it doesn't hold as many row locks for as long
> a
> > time. The drawback is that I believe this approach
> > would be a lot slower.
> >
> > And finally, I've considered the idea of using COPY to copy the data that
> > needs to be kept to temporary tables,
> > truncating the original tables and then copying the data back. I believe
> > this would be the most efficient way to
> > do the delete but the implementation is quite a bit more complicated than
> > the first two options I described,
> > and has implications for how to deal with error scenarios or
> database/system
> > crashes while the operation is in progress.
>
> you don't need to COPY -- just insert/select, like this:
>
> CREATE TABLE keeprows(LIKE old_table INCLUDING INDEXES INCLUDING
> CONSTRAINTS);
> INSERT INTO keeprows SELECT * FROM old_table WHERE ...
> DROP TABLE old_table;
> ALTER TABLE keeprows RENAME to old_table CASCADE;
> <reset RI rules>
>
> This procedure is not side effect free and has some caveats: for
> example, if you have code that is dependent on the table's composite
> type that will also drop and has to be reconstructed.
>
> Perhaps a better way to go if you can structure your code around it is
> to simply partition your table around when it gets loaded and dropped
> -- then your deletion becomes 'DROP TABLE' with no extra processing.
>
> merlin
>
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2011-10-28 17:01:30 | Re: best way to do bulk delete? |
Previous Message | Detox | 2011-10-28 16:21:13 | PgAdmin III shows removed PostgreSQL |