Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Sébastien <bokanist(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Date: 2025-02-17 08:46:13
Message-ID: CAGjGUAJNCbKz28PcYcpwVqS3SMFjR+8-LCCJz0K=XsMQkE+weg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI Sébastien
You can check out the email subject:Trigger more frequent autovacuums
of heavy insert tables , I think it can alleviate the problem

Thanks

On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
> > large-scale cold data imports, reducing the impact on system performance
> > post-import. For large imports, migrations and major version upgrades.
> > Business Use-case:
> >
> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
> heavily
> > loaded server, we observed that the system struggled significantly weeks
> > later when the autovacuum process had to freeze all the imported data
> > pages. This led to severe performance degradation, requiring manual
> > intervention to prioritize vacuum jobs to complete them as quickly as
> > possible.
>
> What version of postgres was this? What batch sizes do you need to
> support?
> I.e. is all of this data inserted at once, or in steps?
>
> As already discussed, it seems unlikely that we'll ever support INSERT
> FROZEN,
> due to the potential of causing concurrent queries to give bogus answers.
> But
> there's actually a lot we can do to improve this short of INSERT FROZEN.
>
> The reason I asked for the version is that the behaviour would e.g. likely
> be
> worse before autovacuum_vacuum_insert_scale_factor existed. We are working
> on
> improvements around that in 18 too, ensuring that the gap between insert
> triggered vacuums does not grow forever.
>
> Several recent releases have also improved the situation around this in
> other
> ways, e.g. by just making vacuuming faster and by avoiding doing redundant
> work in more cases (by increasing relfrozenzid more aggressively).
>
> We've also been talking about performing freezing during e.g. checkpoints,
> if
> possible.
>
> If you're inserting all the data in a single transaction however, it'll be
> hard to improve most of this, because while that long long transaction
> runs,
> we can't do anything that needs to know the transaction has finished. OTOH,
> if it were a single transaction, you could already use COPY FREEZE.
>
>
> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
> currently
> does not use the bulk-insert logic, leading the INSERT to cause a lot more
> WAL
> to be emitted compared to inserting the same data via COPY.
>
>
> > This issue is particularly critical during database *migrations* or
> *version
> > upgrades*, where a full data reload is often necessary. Each time a major
> > PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
> > would allow importing data that is known to be immutable, preventing
> > unnecessary vacuum overhead and reducing system strain.
>
> What are you using for such upgrades or migrations? I'd not expect INSERT
> to
> be used, due to the overhead that has compared to COPY.
>
> Greetings,
>
> Andres Freund
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Yegorov 2025-02-17 08:46:57 Re: Get rid of WALBufMappingLock
Previous Message Ilia Evdokimov 2025-02-17 08:07:54 Re: explain analyze rows=%.0f