From: | christian_behrens(at)gmx(dot)net |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Need to update all my 60 million rows at once without transactional integrity |
Date: | 2008-04-20 22:19:34 |
Message-ID: | 20080420221934.13270@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi!
How can I make a Update of a column in a very large table for all rows without using the double amount of disc space and without any need for atomic operation?
I have a very large table with about 60 million rows. I sometimes need to do a simple update to ALL rows that resets a status-flag to zero.
I don't need to have transactional integrity (but of course if the system crashes, there should be no data corruption. A separate flag in the file system can well save the fact that that bulk update was in progress) for this, I don't care or want a abort or "all or nothing" like SQL mandates. The idea is basically that either this update succeeds or it succeeds or - there is no "not". It must succeed. It must be tried until it works. If the update is halfway finished, that's okay.
If I just do an
UPDATE table SET flag=0;
then Pg will make a copy of every row which must be cleaned up by vaccuum. I understand - and don't like during this specific problem - that PG is a MVCC database and will keep my old versions of the rows before the update alive. This is very bad.
If I do a batched loop like this:
UPDATE table SET flag=0 where id>=0 and id <200;
UPDATE table SET flag=0 where id>=200 and id <400;
UPDATE table SET flag=0 where id>=400 and id <600;
...
then PG will seek all over my harddrive I think.
It would be much better if it could just start in the beginning of the table and work it's way towards the end. But which sort-criteria would satisfy this? I don't think that there is any SQL that does something like that.
Another ideas (and I think it's a quite good idea) would be to
drop the column and recreate it with a new default value.
But the problem is that this is not actually MY database, but an appliance (which has a harddrive that does not have the double amount of space available btw) and it has to work reliably whenever something special happens.
And I don't think I should create new columns (the old one would be hidden and their internal column ids lost I think) all the time, that might have a limit.
Can I do it maybe every day??
Is there any other way to go?
I would really like to kind of "skip transactions". Of course basic data integrity in on disc structures, but not atomicy for this bulk update!
I read that PG has an internal command language that is used to build up a database when all the tables for e.g. table-names are not present yet.
Could I use that to hack my way around transactions?
Basically I can do everything to this PG installation, this is an extremly controlled, locked down environment. I don't need to care for anyone else, it's always the same schema, the same version, same OS, etc. and I could happily apply any hack that solves the problem.
Many thanks for any pointers or ideas how to solve this problem!
Christian
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-04-21 01:25:38 | Re: Changed Hosts, Lots of Errors in PostgreSQL - Help Please! |
Previous Message | Dawid Kuroczko | 2008-04-20 21:56:15 | Re: In the belly of the beast (MySQLCon) |