[Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

From: Sébastien <bokanist(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Date: 2025-02-13 09:52:31
Message-ID: CANtq+vS0KH9GPSjkQ65VmTcaWwPr3D+C2HNK1dN=C06xdvsL=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One-line Summary:

Introduce an INSERT FROZEN feature to bypass vacuum processing for
large-scale cold data imports, reducing the impact on system performance
post-import. For large imports, migrations and major version upgrades.
Business Use-case:

When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
loaded server, we observed that the system struggled significantly weeks
later when the autovacuum process had to freeze all the imported data
pages. This led to severe performance degradation, requiring manual
intervention to prioritize vacuum jobs to complete them as quickly as
possible.

This issue is particularly critical during database *migrations* or *version
upgrades*, where a full data reload is often necessary. Each time a major
PostgreSQL upgrade occurs, users must reimport large datasets, leading to
the same problem of vacuum storms post-import. An INSERT FROZEN feature
would allow importing data that is known to be immutable, preventing
unnecessary vacuum overhead and reducing system strain.
User impact with the change:

- Users importing large, cold datasets (initial loads, migrations,
version upgrades) can mark them as "frozen" during insertion, so pages are
directly marked as frozen.
- Reduced risk of autovacuum storms weeks after large imports.
- More predictable system performance post-import and post-upgrade.
- Avoid unnecessary rewriting of all pages after.
- Significant improvement for users who perform regular major version
upgrades and need to reload data.

Implementation details:

- A new INSERT FROZEN option could be introduced, similar to COPY FREEZE,
allowing direct insertion of tuples in a frozen state.
- This would likely require changes in heap storage logic to ensure
tuples are written with a frozen XID at insert time.
- Consideration should be given to transaction semantics and WAL logging
to ensure consistency and crash recovery integrity.

Estimated Development Time:

Unknown (would require input from core developers to assess complexity).
But I think it's not that much and pretty straightforward to implement
experimentally. Then.
Opportunity Window Period:

...
Budget Money:

...
Contact Information:
If you have further question regarding the issues I experienced that this
would solve, feel free to contact me

Sébastien Caunes bokanist(at)gmail(dot)com

Thank you for your attention.

--
Sébastien Caunes
+33 6 7 229 229 7

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2025-02-13 09:53:41 Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.
Previous Message Sébastien 2025-02-13 09:51:49 Fwd: [Feature Request] Per-Database Transaction Logs for Enhanced Isolation and New Capabilities