Re: filesystem full during vacuum - space recovery issues

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: filesystem full during vacuum - space recovery issues
Date: 2024-07-18 22:41:14
Message-ID: CANzqJaD_bEyG9Fve0OPGUfAdjJL36_ZKm4aLQuMytCBp0yYOOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Multi-threaded writing to the same giant text file won't work too well,
when all the data for one table needs to be together.

Just temporarily add another disk for backups.

On Thu, Jul 18, 2024 at 4:55 PM Thomas Simpson <ts(at)talentstack(dot)to> wrote:

>
> On 18-Jul-2024 16:32, Ron Johnson wrote:
>
> On Thu, Jul 18, 2024 at 3:01 PM Thomas Simpson <ts(at)talentstack(dot)to> wrote:
> [snip]
>
>> [BTW, v9.6 which I know is old but this server is stuck there]
>>
> [snip]
>
>> I know I'm stuck with the slow rebuild at this point. However, I doubt I
>> am the only person in the world that needs to dump and reload a large
>> database. My thought is this is a weak point for PostgreSQL so it makes
>> sense to consider ways to improve the dump reload process, especially as
>> it's the last-resort upgrade path recommended in the upgrade guide and the
>> general fail-safe route to get out of trouble.
>>
> No database does fast single-threaded backups.
>
> Agreed. My thought is that is should be possible for a 'new dumpall' to
> be multi-threaded.
>
> Something like :
>
> * Set number of threads on 'source' (perhaps by querying a listening
> destination for how many threads it is prepared to accept via a control
> port)
>
> * Select each database in turn
>
> * Organize the tables which do not have references themselves
>
> * Send each table separately in each thread (or queue them until a thread
> is available) ('Stage 1')
>
> * Rendezvous stage 1 completion (pause sending, wait until feedback from
> destination confirming all completed) so we have a known consistent state
> that is safe to proceed to subsequent tables
>
> * Work through tables that do refer to the previously sent in the same way
> (since the tables they reference exist and have their data) ('Stage 2')
>
> * Repeat progressively until all tables are done ('Stage 3', 4 etc. as
> necessary)
>
> The current dumpall is essentially doing this table organization currently
> [minus stage checkpoints/multi-thread] otherwise the dump/load would not
> work. It may even be doing a lot of this for 'directory' mode? The change
> here is organizing n threads to process them concurrently where possible
> and coordinating the pipes so they only send data which can be accepted.
>
> The destination would need to have a multi-thread listen and co-ordinate
> with the sender on some control channel so feed back completion of each
> stage.
>
> Something like a destination host and control channel port to establish
> the pipes and create additional netcat pipes on incremental ports above the
> control port for each thread used.
>
> Dumpall seems like it could be a reasonable start point since it is
> already doing the complicated bits of serializing the dump data so it can
> be consistently loaded.
>
> Probably not really an admin question at this point, more a feature
> enhancement.
>
> Is there anything fundamentally wrong that someone with more intimate
> knowledge of dumpall could point out?
>
> Thanks
>
> Tom
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Thomas Simpson 2024-07-18 23:08:06 Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)
Previous Message Thomas Simpson 2024-07-18 20:53:23 Re: filesystem full during vacuum - space recovery issues

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-07-18 22:51:51 Re: Using LibPq in TAP tests via FFI
Previous Message Tomas Vondra 2024-07-18 22:17:44 Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx