Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

From: Sébastien <bokanist(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Date: 2025-02-18 08:33:20
Message-ID: CANtq+vSVL7shVAx6uhdzMFTN1ts+0o6TUUmrm5CbpO-XmJjsfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry it won't work. It just delays the problem. But still the freeze
procedure must rewrite all pages.

Le mar. 18 févr. 2025, 04:12, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> a écrit :

> Hello Sébastien
> this case can be solved by xid64,but it seems like very few people
> are interested.But it seems to me that xid64 should be implemented as soon
> as possible.
>
>
> Thanks
>
> On Mon, Feb 17, 2025 at 9:47 PM Sébastien <bokanist(at)gmail(dot)com> wrote:
>
>> Hello weuhui,
>>
>> It's not a problem with heavy insert table but heavy delete. Also
>> triggering more frequent autovacuum will not help because autovacuum does
>> not delete recently dead tuples when a large and slow vacuum freeze
>> operation older than their delete is still running in parallel. The
>> solution was to increase the priority and speed of the vaccum freeze
>> opeartion.
>>
>> Anyway, there should be a way to insert freeze data other than copy that
>> does not work with foreign tables. (INSERT into my_table select * from
>> foreign_table)
>>
>>
>>
>> Le lun. 17 févr. 2025 à 09:46, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> a
>> écrit :
>>
>>> HI Sébastien
>>> You can check out the email subject:Trigger more frequent
>>> autovacuums of heavy insert tables , I think it can alleviate the problem
>>>
>>> Thanks
>>>
>>> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres(at)anarazel(dot)de>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>>>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>>>> > large-scale cold data imports, reducing the impact on system
>>>> performance
>>>> > post-import. For large imports, migrations and major version upgrades.
>>>> > Business Use-case:
>>>> >
>>>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>>>> heavily
>>>> > loaded server, we observed that the system struggled significantly
>>>> weeks
>>>> > later when the autovacuum process had to freeze all the imported data
>>>> > pages. This led to severe performance degradation, requiring manual
>>>> > intervention to prioritize vacuum jobs to complete them as quickly as
>>>> > possible.
>>>>
>>>> What version of postgres was this? What batch sizes do you need to
>>>> support?
>>>> I.e. is all of this data inserted at once, or in steps?
>>>>
>>>> As already discussed, it seems unlikely that we'll ever support INSERT
>>>> FROZEN,
>>>> due to the potential of causing concurrent queries to give bogus
>>>> answers. But
>>>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>>>
>>>> The reason I asked for the version is that the behaviour would e.g.
>>>> likely be
>>>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>>>> working on
>>>> improvements around that in 18 too, ensuring that the gap between insert
>>>> triggered vacuums does not grow forever.
>>>>
>>>> Several recent releases have also improved the situation around this in
>>>> other
>>>> ways, e.g. by just making vacuuming faster and by avoiding doing
>>>> redundant
>>>> work in more cases (by increasing relfrozenzid more aggressively).
>>>>
>>>> We've also been talking about performing freezing during e.g.
>>>> checkpoints, if
>>>> possible.
>>>>
>>>> If you're inserting all the data in a single transaction however, it'll
>>>> be
>>>> hard to improve most of this, because while that long long transaction
>>>> runs,
>>>> we can't do anything that needs to know the transaction has finished.
>>>> OTOH,
>>>> if it were a single transaction, you could already use COPY FREEZE.
>>>>
>>>>
>>>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>>>> currently
>>>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>>>> more WAL
>>>> to be emitted compared to inserting the same data via COPY.
>>>>
>>>>
>>>> > This issue is particularly critical during database *migrations* or
>>>> *version
>>>> > upgrades*, where a full data reload is often necessary. Each time a
>>>> major
>>>> > PostgreSQL upgrade occurs, users must reimport large datasets,
>>>> leading to
>>>> > the same problem of vacuum storms post-import. An INSERT FROZEN
>>>> feature
>>>> > would allow importing data that is known to be immutable, preventing
>>>> > unnecessary vacuum overhead and reducing system strain.
>>>>
>>>> What are you using for such upgrades or migrations? I'd not expect
>>>> INSERT to
>>>> be used, due to the overhead that has compared to COPY.
>>>>
>>>> Greetings,
>>>>
>>>> Andres Freund
>>>>
>>>>
>>>>
>>
>> --
>> Sébastien Caunes
>> +33 6 7 229 229 7
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-02-18 08:34:45 Re: Support a wildcard in backtrace_functions
Previous Message jian he 2025-02-18 08:31:40 Re: Non-text mode for pg_dumpall