From: | Timokhin Maxim <ncx2(at)yandex(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Updating a large table |
Date: | 2018-01-09 12:18:48 |
Message-ID: | 4917721515500328@web47j.yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Tomas! Thank you for the useful answer!
23.12.2017, 23:58, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>:
> On 12/22/2017 05:46 PM, Timokhin Maxim wrote:
>> Hello! We have a large table 11GB ( about 37 million records ) and we
>> need to alter a table - add a new column with default values is
>> false. Also 'NOT NULL' is required.
>>
>> So, first I've done:
>>
>> ALTER TABLE clusters ALTER COLUMN "is_paid";
>
> That seems somewhat incomplete ... what exactly did the ALTER do?
I'll try to explain what exactly I meant.
ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN NOT NULL DEFAULT FALSE;
What exactly I need.
But that query would lock the whole table for about 40 minutes. I decided to separate it like:
1. ALTER TABLE clusters ADD COLUMN "is_paid" BOOLEAN DEFAULT FALSE;
2. UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE; ( This was needed as soon as possible )
3. UPDATE another part by chunks
4. set NOT NULL for the table.
I was thinking about how to optimize the 3th step.
Well, my solution was to write a script which runs two threads. The first one UPDATE "is_paid" by chunks, another one checks my metrics. If something is becoming wrong first thread stops until metrics are good.
Thank you, Tomas.
>
>> after that:
>>
>> UPDATE clusters SET is_paid = DEFAULT where ctime <= now() - interval '720h' AND is_paid != FALSE;
>>
>> Everything went ok. Then I tried to run it again for an interval of 1
>> years. And I got that no one can't see - the was no available space
>> on a disk. The reason was WAL-files ate everything.
>> Master-server couldn't send some WAL-file to their replicas. Bandwidth wasn't enough.
>
> Well, then perhaps the best solution is to add more disk space and/or
> make sure the network bandwidth is sufficient?
>
> In any case, don't forget this may also need to update all indexes on
> the table, because the new row versions will end up on different pages.
> So while the table has 11GB, this update may need much more WAL space
> than that.
>
Got it, thank you!
>> Well, I'm searching for a better idea to update the table.
>> Solutions I found.
>> 1. Separate my UPDATE by chunks.
>
> If this is a one-time change, this is probably the best option.
>
Exactly, thank you!
>> 2. Alter a table using a new temporary table, but it's not convenient
>> for me because there is a lot of foreign keys and indexes.
>
> Right.
>
>> 3. Hot-update. This is the most interesting case for me.
>> Speaking of HOT-update https://www.dbrnd.com/2016/03/postgresql-the-awesome-table-fillfactor-to-speedup-update-and-select-statement/
>> The article says: it might be useful for tables that change often and moreover It would be the best way to increase the speed of UPDATE.
>
> First of all, to make HOT possible there would have to be enough free
> space on the pages. As you need to update the whole table, that means
> each table would have to be only 50% full. That's unlikely to be true,
> and you can't fix that at this point.
>
>> So, my questions are will it work for all tuples? It says that - no
>> https://www.dbrnd.com/2016/03/postgresql-alter-table-to-change-
>> fillfactor-value/, but I could not find a confirmation in official
>> postresql's documentation.
>
> Not sure I understand your question, but HOT can only happen when two
> conditions are met:
>
> 1) the update does not change any indexed column
>
> This is likely met, assuming you don't have an index on is_paid.
>
> 2) there's enough space on the same page for the new row version
>
> This is unlikely to be true, because the default fillfactor for tables
> is 90%. You may change fillfactor using ALTER TABLE, but that only
> applies to new data.
>
> Moreover, as the article says - this is useful for tables that change
> often. Which is not quite what one-time table rewrite does.
>
> So HOT is not the solution you're looking for.
>
>> Why do I need to launch vacuum after updating?
>
> You don't need to launch vacuum - autovacuum will take care of that
> eventually. But you may do that, to do the cleanup when it's convenient
> for you.
>
>> How should I reduce the better fillfactor?
>
> For example to change fillfactor to 75% (i.e. 25% free space):
>
> ALTER TABLE t SET (fillfactor = 75);
>
> But as I said, it's not a solution for you.
>
>> What will be with WAL-files it this case?
>
> Not sure what you mean.
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Sergei Kornilov | 2018-01-09 12:53:44 | Re: Updating a large table |
Previous Message | Rambabu V | 2018-01-09 06:53:24 | Need Help on wal_compression |