Re: Dropping column from big table

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Dropping column from big table
Date: 2024-07-16 00:37:31
Message-ID: 20240716003731.m75sye7qq74wadhe@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2024-07-16 02:00:27 +0530, sud wrote:
>
> On Mon, Jul 15, 2024 at 7:58 PM Peter J. Holzer <hjp-pgsql(at)hjp(dot)at> wrote:
> > Hm, true.
> >
> > You can always do
> >
> >   UPDATE tab SET id = id;
> >
> > followed by
> >
> >   VACUUM (FULL) tab;
>
> Yes, that should work. It needs about twice the size of the table in
> temporary space, though.
>
> Since the OP wrote that the table is "daily ... and 90 partitions"
> (which understand that there is one partition per day and partitions are
> kept for 90 days) it might be better to just wait. After 90 days all the
> partitions with the obsolete column will be gone.
>
>     
>
> Thank You very much.
>
> As I understand, after dropping a column, it will still internally hold the
> "NOT NULL" values in that column for all the rows, even though it's not visible
> outside.
>
> So, after the DROP column,  it will force update any of the columns as below,
> Then it will force create another copy of each of the rows even if the column
> is updated to the same value. The new copy will have the dropped column with
> values as NULL. And the post "VACUUM FULL '' will clean all the rows with "NOT
> NULL '' values of that dropped column and thus reclaim the space.

Correct.

> But the only issue would be "VACUUM FULL" will take a table lock and also it
> may take longer to run this vacuum on the full table considering the size of
> the table in TB's. Thus, is it fine to just leave it post execution of the
> "update" statement , so that the normal vacuum operation (which will be online
> operation) and that will take care of the removal of old rows ?

This is unlikely to recover the space.

The UPDATE will duplicate all rows. Since - presumably - there isn't
much free space within each partition the new rows will go at the end of
each partition, effectively doubling its size.

A regular VACUUM (whether autovacuum or invoked manually) will then
remove the old rows. and make the space available for new data. But
since that newly free space is at the beginning of each partition it
can't be returned to the OS. It would be available for new data written
to those partitions I guess not much is written to old partitons.

You could, however, do this in small steps and vacuum after each.
Something like this (in Python)

conn = ...
csr = conn.cursor()
for i in range(100):
csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,))
conn.commit()
csr.execute("VACUUM tab")
conn.commit()

That might just be able to squeeze the new rows in between the existing
rows and not grow the table.

>  And also, As you mentioned we may also leave it as is and wait for the
> partition to be dropped, so that the dropped column with "not null" values
> which are still lying under the hood and are occupying space will be removed
> automatically. But even then, is that dropped column still lying in the rows
> with null values in it throughout its lifecycle, till the table exists in the
> database?

Yes. But a nullable column with a null value takes only a single bit of
storage, so that's negligible.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2024-07-16 02:26:26 Re: PostgreSQL Active-Active Clustering
Previous Message Christoph Moench-Tegeder 2024-07-15 21:54:13 Re: PostgreSQL Active-Active Clustering