Re: "Healing" a table after massive updates

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: "Bill Moran" <wmoran(at)collaborativefusion(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Gauthier\, Dave" <dave(dot)gauthier(at)intel(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: "Healing" a table after massive updates
Date: 2008-09-13 14:09:27
Message-ID: 87sks4qgrs.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:

> On Thu, Sep 11, 2008 at 8:56 AM, Bill Moran
> <wmoran(at)collaborativefusion(dot)com> wrote:
>> In response to Alvaro Herrera <alvherre(at)commandprompt(dot)com>:
>>
>>> Bill Moran wrote:
>>> > In response to "Gauthier, Dave" <dave(dot)gauthier(at)intel(dot)com>:
>>> >
>>> > > I might be able to answer my own question...
>>> > >
>>> > > vacuum FULL (analyze is optional)
>>> >
>>> > CLUSTER _may_ be a better choice, but carefully read the docs regarding
>>> > it's drawbacks first. You may want to do some benchmarks to see if it's
>>> > really needed before you commit to it as a scheduled operation.
>>>
>>> What drawbacks?
>>
>> There's the whole "there will be two copies of the table on-disk" thing
>> that could be an issue if it's a large table.
>
> I've also found cluster to be pretty slow, even on 8.3. On a server
> that hits 30-40Megs a second write speed for random access during
> pgbench, it's writing out at 1 to 2 megabytes a second when it runs,
> and takes the better part of a day on our biggest table. vacuumdb -fz
> + reindexdb ran in about 6 hours which means we could fit it into our
> maintenance window. vacuum moves a lot more data per second than
> cluster.

Alternative you can do

ALTER TABLE tab ALTER col TYPE <sametype> USING col;

which will rewrite the table without using an index. This is much faster but
has the same other disadvantages.

Incidentally the visibility bugs are indeed entirely fixed in 8.3. In 8.2 and
before cluster and alter table rewrites can both cause tuples to not appear
for transactions which were started before the cluster or alter table such as
a long-running pg_dump.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ben Chobot 2008-09-13 15:44:46 detecting recurring appointment conflicts
Previous Message Martin Gainty 2008-09-13 13:44:31 Re: Getting rows in a very specific order