Re: Thoughts on how to avoid a massive integer update.

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Thoughts on how to avoid a massive integer update.
Date: 2020-05-04 23:46:27
Message-ID: 1ad51d80-94d5-cd49-cda0-d55daf9bfbb8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 5/4/20 3:32 PM, Fehrle, Brian wrote:
>
> Hi all,
>
> This is a shot in the dark in hopes to find a magic bullet to fix an
> issue I have, I can’t personally think of any solution myself.
>
> I have a database with hundreds of terabytes of data, where every
> table has an integer column referencing a small table. For reasons out
> of my control and cannot change, I NEED to update every single row in
> all these tables, changing the integer value to a different integer.
>
> Since I have to deal with dead space, I can only do a couple tables at
> a time, then do a vacuum full after each one.
> Another option is to build a new table with the new values, then drop
> the old one and swap in the new, either way is very time consuming.
>
> Initial tests suggest this effort will take several months to
> complete, not to mention cause blocking issues on tables being worked on.
>
> Does anyone have any hackery ideas on how to achieve this in less
> time? I was looking at possibly converting the integer column type to
> another that would present the integer differently, like a hex value,
> but everything still ends up requiring all data to be re-written to
> disk. In a well designed database (I didn’t design it :) ), I would
> simply change the data in the referenced table (200 total rows),
> however the key being referenced isn’t just an arbitrary ID, it’s
> actual ‘data’, and must be changed.
>
> Thanks for any thoughts or ideas,
>
> * Brian F
>

Here's my wife solution:  add a column to small table, fill with
duplicate of the original column.  Change the foreign keys of the huge
tables to reference the new column in the small table.  Update the
original values to what you now need them to be.  (Not sure how much
re-index cost you'll see here, but non-zero I'm guessing)

Your reports now must of course return the new value via joins to the
dictionary(?) table.  Views my be your best bet here.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-05-04 23:54:18 Re: Temporary table has become problematically persistent
Previous Message Clifford Snow 2020-05-04 23:40:19 Installing Postgis25_11