From: | Sébastien <bokanist(at)gmail(dot)com> |
---|---|
To: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations |
Date: | 2025-02-17 13:47:43 |
Message-ID: | CANtq+vQWryzGmPntziOR8c9M=JG0SKwRgH+bS=znS4hPZ=TcqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello weuhui,
It's not a problem with heavy insert table but heavy delete. Also
triggering more frequent autovacuum will not help because autovacuum does
not delete recently dead tuples when a large and slow vacuum freeze
operation older than their delete is still running in parallel. The
solution was to increase the priority and speed of the vaccum freeze
opeartion.
Anyway, there should be a way to insert freeze data other than copy that
does not work with foreign tables. (INSERT into my_table select * from
foreign_table)
Le lun. 17 févr. 2025 à 09:46, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> a écrit :
> HI Sébastien
> You can check out the email subject:Trigger more frequent autovacuums
> of heavy insert tables , I think it can alleviate the problem
>
> Thanks
>
> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>> > large-scale cold data imports, reducing the impact on system performance
>> > post-import. For large imports, migrations and major version upgrades.
>> > Business Use-case:
>> >
>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>> heavily
>> > loaded server, we observed that the system struggled significantly weeks
>> > later when the autovacuum process had to freeze all the imported data
>> > pages. This led to severe performance degradation, requiring manual
>> > intervention to prioritize vacuum jobs to complete them as quickly as
>> > possible.
>>
>> What version of postgres was this? What batch sizes do you need to
>> support?
>> I.e. is all of this data inserted at once, or in steps?
>>
>> As already discussed, it seems unlikely that we'll ever support INSERT
>> FROZEN,
>> due to the potential of causing concurrent queries to give bogus answers.
>> But
>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>
>> The reason I asked for the version is that the behaviour would e.g.
>> likely be
>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>> working on
>> improvements around that in 18 too, ensuring that the gap between insert
>> triggered vacuums does not grow forever.
>>
>> Several recent releases have also improved the situation around this in
>> other
>> ways, e.g. by just making vacuuming faster and by avoiding doing redundant
>> work in more cases (by increasing relfrozenzid more aggressively).
>>
>> We've also been talking about performing freezing during e.g.
>> checkpoints, if
>> possible.
>>
>> If you're inserting all the data in a single transaction however, it'll be
>> hard to improve most of this, because while that long long transaction
>> runs,
>> we can't do anything that needs to know the transaction has finished.
>> OTOH,
>> if it were a single transaction, you could already use COPY FREEZE.
>>
>>
>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>> currently
>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>> more WAL
>> to be emitted compared to inserting the same data via COPY.
>>
>>
>> > This issue is particularly critical during database *migrations* or
>> *version
>> > upgrades*, where a full data reload is often necessary. Each time a
>> major
>> > PostgreSQL upgrade occurs, users must reimport large datasets, leading
>> to
>> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
>> > would allow importing data that is known to be immutable, preventing
>> > unnecessary vacuum overhead and reducing system strain.
>>
>> What are you using for such upgrades or migrations? I'd not expect INSERT
>> to
>> be used, due to the overhead that has compared to COPY.
>>
>> Greetings,
>>
>> Andres Freund
>>
>>
>>
--
Sébastien Caunes
+33 6 7 229 229 7
From | Date | Subject | |
---|---|---|---|
Next Message | KENAN YILMAZ | 2025-02-17 14:09:36 | Re: Bypassing cursors in postgres_fdw to enable parallel plans |
Previous Message | Matthias van de Meent | 2025-02-17 13:01:09 | Re: Parallel CREATE INDEX for GIN indexes |