From: | Tom Allison <tom(at)tacocat(dot)net> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | "christian_behrens(at)gmx(dot)net" <christian_behrens(at)gmx(dot)net>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Need to update all my 60 million rows at once without transactional integrity |
Date: | 2008-04-23 21:35:14 |
Message-ID: | BB3A6D34-4DBD-4347-BAA5-9947FE132BA9@tacocat.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Far from being an expert on postgres, but there are two ideas--
assuming that you cannot afford the time it would take to simply
UPDATE and wait...
Write a script to update all the rows, one at a time. Lowest impact to
operations but would take a very long time.
Assuming you have a sequence primary key value on each row, update by
ID blocks on the order of 10,000's or 100,000's at a time (or more).
This is a balancing act between time to complete and immediate load on
the server.
I've used both options but I don't think I've exceeded 15 million rows.
Sent from my iPhone.
On Apr 23, 2008, at 2:15 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, 2008-04-21 at 00:19 +0200, christian_behrens(at)gmx(dot)net wrote:
>
>> How can I make a Update of a column in a very large table for all
>> rows
>> without using the double amount of disc space and without any need
>> for
>> atomic operation?
>>
>> I have a very large table with about 60 million rows. I sometimes
>> need
>> to do a simple update to ALL rows that resets a status-flag to zero.
>>
>> I don't need to have transactional integrity (but of course if the
>> system crashes, there should be no data corruption.
>
> No such thing. Without transactions you have no sensible definition of
> what constitutes data corruption.
>
>> A separate flag in the file system can well save the fact that that
>> bulk update was in progress) for this, I don't care or want a abort
>> or
>> "all or nothing" like SQL mandates. The idea is basically that either
>> this update succeeds or it succeeds or - there is no "not". It must
>> succeed. It must be tried until it works. If the update is halfway
>> finished, that's okay.
>
> Don't reset them to zero, just redefine the meaning of the counter.
> Take
> the max value in the table and then have the app understand that
> anything <= the previous max value means the same thing as whatever
> "status = 0" means now. The app would need to read the new baseline
> value before performing any work.
>
> This is roughly the technique used by Slony to avoid needing to update
> every row in the log table to show that it has successfully replicated
> it. It's also the technique MVCC relies upon internally.
>
> It's less work and crash safe in all cases.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Treat | 2008-04-23 21:51:48 | Re: How to modify ENUM datatypes? |
Previous Message | Steve Crawford | 2008-04-23 21:26:31 | Re: Create temporary function |