Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: bokanist(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Date: 2025-02-18 03:12:03
Message-ID: CAGjGUA+v=y=hf9z3q=hzoC4Oruhh0Mi8v0DTg2d-w1UJwGkZsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Sébastien
this case can be solved by xid64,but it seems like very few people
are interested.But it seems to me that xid64 should be implemented as soon
as possible.

Thanks

On Mon, Feb 17, 2025 at 9:47 PM Sébastien <bokanist(at)gmail(dot)com> wrote:

> Hello weuhui,
>
> It's not a problem with heavy insert table but heavy delete. Also
> triggering more frequent autovacuum will not help because autovacuum does
> not delete recently dead tuples when a large and slow vacuum freeze
> operation older than their delete is still running in parallel. The
> solution was to increase the priority and speed of the vaccum freeze
> opeartion.
>
> Anyway, there should be a way to insert freeze data other than copy that
> does not work with foreign tables. (INSERT into my_table select * from
> foreign_table)
>
>
>
> Le lun. 17 févr. 2025 à 09:46, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> a
> écrit :
>
>> HI Sébastien
>> You can check out the email subject:Trigger more frequent
>> autovacuums of heavy insert tables , I think it can alleviate the problem
>>
>> Thanks
>>
>> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> Hi,
>>>
>>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>>> > large-scale cold data imports, reducing the impact on system
>>> performance
>>> > post-import. For large imports, migrations and major version upgrades.
>>> > Business Use-case:
>>> >
>>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>>> heavily
>>> > loaded server, we observed that the system struggled significantly
>>> weeks
>>> > later when the autovacuum process had to freeze all the imported data
>>> > pages. This led to severe performance degradation, requiring manual
>>> > intervention to prioritize vacuum jobs to complete them as quickly as
>>> > possible.
>>>
>>> What version of postgres was this? What batch sizes do you need to
>>> support?
>>> I.e. is all of this data inserted at once, or in steps?
>>>
>>> As already discussed, it seems unlikely that we'll ever support INSERT
>>> FROZEN,
>>> due to the potential of causing concurrent queries to give bogus
>>> answers. But
>>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>>
>>> The reason I asked for the version is that the behaviour would e.g.
>>> likely be
>>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>>> working on
>>> improvements around that in 18 too, ensuring that the gap between insert
>>> triggered vacuums does not grow forever.
>>>
>>> Several recent releases have also improved the situation around this in
>>> other
>>> ways, e.g. by just making vacuuming faster and by avoiding doing
>>> redundant
>>> work in more cases (by increasing relfrozenzid more aggressively).
>>>
>>> We've also been talking about performing freezing during e.g.
>>> checkpoints, if
>>> possible.
>>>
>>> If you're inserting all the data in a single transaction however, it'll
>>> be
>>> hard to improve most of this, because while that long long transaction
>>> runs,
>>> we can't do anything that needs to know the transaction has finished.
>>> OTOH,
>>> if it were a single transaction, you could already use COPY FREEZE.
>>>
>>>
>>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>>> currently
>>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>>> more WAL
>>> to be emitted compared to inserting the same data via COPY.
>>>
>>>
>>> > This issue is particularly critical during database *migrations* or
>>> *version
>>> > upgrades*, where a full data reload is often necessary. Each time a
>>> major
>>> > PostgreSQL upgrade occurs, users must reimport large datasets, leading
>>> to
>>> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
>>> > would allow importing data that is known to be immutable, preventing
>>> > unnecessary vacuum overhead and reducing system strain.
>>>
>>> What are you using for such upgrades or migrations? I'd not expect
>>> INSERT to
>>> be used, due to the overhead that has compared to COPY.
>>>
>>> Greetings,
>>>
>>> Andres Freund
>>>
>>>
>>>
>
> --
> Sébastien Caunes
> +33 6 7 229 229 7
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-02-18 03:12:06 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Hayato Kuroda (Fujitsu) 2025-02-18 02:02:05 RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.