Re: Alter the column data type of the large data volume table.

From: Olivier Gautherot <ogautherot(at)gautherot(dot)net>
To: charles meng <xlyybz(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Alter the column data type of the large data volume table.
Date: 2020-12-04 08:44:06
Message-ID: CAJ7S9TVKUA6FK0zXqsGjokB6E7pAdm0vPR3z9vPaJN_heM3yWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz(at)gmail(dot)com> wrote:

> What I mean is that it will take a lot of time to complete all data
> processing.I have to say that it is a good solution to adjust the column
> type without affecting online users.
>
> I found a tool on github, see the link below, unfortunately, this is for
> MySQL...
>
> https://github.com/github/gh-ost
>

MySQL has its own strategy with regard to column handling so what works
there does not necessarily fit here.

There are some good ideas in this thread but I would start with a few
checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to
add a column typed bigint - if it is recent enough it will be a simple
catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested,
progressively in batches of a few tens of thousands. Make sure you commit
and vacuum after each to retrieve the space (or you may end up with a
storage space issue in addition to all the rest). In the meantime, add a
trigger to set the new column to the index value. Once the new column is
complete, drop the old column and set the new one as primary key (it may
take a while to recalculate the index).

4) If your table is still growing, I would definitely look into
partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition
(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

> regards.
>
> Michael Lewis <mlewis(at)entrata(dot)com> 于2020年12月4日周五 下午1:04写道:
>
>> Please do reply all.
>>
>> Not sure what you mean about it taking too much time. It's rewriting a
>> bunch of data. It's going to take a while. The best you can do is break the
>> work up into small pieces and commit each piece.
>>
>> On Thu, Dec 3, 2020, 7:11 PM charles meng <xlyybz(at)gmail(dot)com> wrote:
>>
>>> Thanks for your help, I think the first method I tried (adding temporary
>>> column) is similar to what you said, but it takes too much time for me.
>>>
>>>
>>> Thanks again.
>>>
>>> Michael Lewis <mlewis(at)entrata(dot)com> 于2020年12月4日周五 上午1:11写道:
>>>
>>>> On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz(at)gmail(dot)com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a table with 1.6 billion records. The data type of the primary
>>>>> key column is incorrectly used as integer. I need to replace the type of
>>>>> the column with bigint. Is there any ideas for this?
>>>>>
>>>>> Solutions that have been tried:
>>>>> Adding temporary columns was too time-consuming, so I gave up.
>>>>> Using a temporary table, there is no good way to migrate the original
>>>>> table data to the temporary table
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>
>>>> You can add a new column with NO default value and null as default and
>>>> have it be very fast. Then you can gradually update rows in batches (if on
>>>> PG11+, perhaps use do script with a loop to commit after X rows) to set the
>>>> new column the same as the primary key. Lastly, in a transaction, update
>>>> any new rows where the bigint column is null, and change which column is
>>>> the primary key & drop the old one. This should keep each transaction
>>>> reasonably sized to not hold up other processes.
>>>>
>>>
Cheers
--
Olivier Gautherot

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wim Bertels 2020-12-04 10:09:36 Re: psql > split > queries & output
Previous Message charles meng 2020-12-04 08:12:00 Re: Alter the column data type of the large data volume table.