Re: Dropping column from big table

From: sud <suds1434(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org, "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
Subject: Re: Dropping column from big table
Date: 2024-07-16 04:26:16
Message-ID: CAD=mzVVTE1Jx0fpsQ=OVW0CL5LiGg_RHA4kq+cbPqTShA5EwVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jul 16, 2024 at 6:07 AM Peter J. Holzer <hjp-pgsql(at)hjp(dot)at> wrote:

>
> > But the only issue would be "VACUUM FULL" will take a table lock and
> also it
> > may take longer to run this vacuum on the full table considering the
> size of
> > the table in TB's. Thus, is it fine to just leave it post execution of
> the
> > "update" statement , so that the normal vacuum operation (which will be
> online
> > operation) and that will take care of the removal of old rows ?
>
> This is unlikely to recover the space.
>
> The UPDATE will duplicate all rows. Since - presumably - there isn't
> much free space within each partition the new rows will go at the end of
> each partition, effectively doubling its size.
>
> A regular VACUUM (whether autovacuum or invoked manually) will then
> remove the old rows. and make the space available for new data. But
> since that newly free space is at the beginning of each partition it
> can't be returned to the OS. It would be available for new data written
> to those partitions I guess not much is written to old partitons.
>
> You could, however, do this in small steps and vacuum after each.
> Something like this (in Python)
>
> conn = ...
> csr = conn.cursor()
> for i in range(100):
> csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,))
> conn.commit()
> csr.execute("VACUUM tab")
> conn.commit()
>
> That might just be able to squeeze the new rows in between the existing
> rows and not grow the table.
>
>
> > And also, As you mentioned we may also leave it as is and wait for the
> > partition to be dropped, so that the dropped column with "not null"
> values
> > which are still lying under the hood and are occupying space will be
> removed
> > automatically. But even then, is that dropped column still lying in the
> rows
> > with null values in it throughout its lifecycle, till the table exists
> in the
> > database?
>
> Yes. But a nullable column with a null value takes only a single bit of
> storage, so that's negligible.
>
>
>

Thank you so much.

Normal vacuum marks the space occupied by the dead tuples as free or
reusable but vacuum full removes those completely. However even with
"vacuum full", the old rows will be removed completely from the storage ,
but the new rows will always be there with the 'dropped' column still
existing under the hood along with the table storage, with just carrying
"null" values in it. However, as it's a single bit of storage so will be
having negligible overhead. If we want to fully remove that column from the
table , we may have to create a new table and dump the data into that from
the existing table and then rename it back to old. Is this understanding
correct?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2024-07-16 04:52:41 Re: Dropping column from big table
Previous Message Ron Johnson 2024-07-16 02:26:26 Re: PostgreSQL Active-Active Clustering