Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)

From: Thomas Simpson <ts(at)talentstack(dot)to>
To: pgsql-admin(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)
Date: 2024-07-18 23:08:06
Message-ID: 95f5bff5-b392-4c2f-9708-8fd5547c620c@talentstack.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

[Added cross post to pgsql-hackers(at)lists(dot)postgresql(dot)org - background is
multi-TB database needs recovered via pgdumpall & reload, thoughts on
ways to make pg_dump scale to multi-thread to expedite loading to a new
cluster.  Straight dump to a file is impractical as the dump will be
>200TB; hackers may be a better home for the discussion than current
admin list]

Hi Ron

On 18-Jul-2024 18:41, Ron Johnson wrote:
> Multi-threaded writing to the same giant text file won't work too
> well, when all the data for one table needs to be together.
>
> Just temporarily add another disk for backups.
>
For clarity, I'm not proposing multi threaded writing to one file; the
proposal is a new special mode which specifically makes multiple output
streams across *network sockets* to a listener which is listening on the
other side.  The goal is avoiding any files at all and only using
multiple network streams to gain multi-threaded processing with some
co-ordination to keep things organized and consistent.

This would really be specifically for the use-case of dump/reload
upgrade or recreate rather than everyday use.  And particularly for very
large databases.

Looking at pg_dump.c it's doing the baseline organization but the
extension would be adding the required coordination with the
destination.  So, for a huge table (I have many) these would go in
different streams but if there is a dependency (FK relations etc) the
checkpoint needs to ensure those are met before proceeding. Worst case
scenario it would end up using only 1 thread but it would be very
unusual to have a database where every table depends on another table
all the way down.

In theory at least, some gains should be achieved for typical databases
where a degree of parallelism is possible.

Thanks

Tom

> On Thu, Jul 18, 2024 at 4:55 PM Thomas Simpson <ts(at)talentstack(dot)to> wrote:
>
>
> On 18-Jul-2024 16:32, Ron Johnson wrote:
>> On Thu, Jul 18, 2024 at 3:01 PM Thomas Simpson
>> <ts(at)talentstack(dot)to> wrote:
>> [snip]
>>
>> [BTW, v9.6 which I know is old but this server is stuck there]
>>
>> [snip]
>>
>> I know I'm stuck with the slow rebuild at this point. 
>> However, I doubt I am the only person in the world that needs
>> to dump and reload a large database.  My thought is this is a
>> weak point for PostgreSQL so it makes sense to consider ways
>> to improve the dump reload process, especially as it's the
>> last-resort upgrade path recommended in the upgrade guide and
>> the general fail-safe route to get out of trouble.
>>
>>  No database does fast single-threaded backups.
>
> Agreed.  My thought is that is should be possible for a 'new
> dumpall' to be multi-threaded.
>
> Something like :
>
> * Set number of threads on 'source' (perhaps by querying a
> listening destination for how many threads it is prepared to
> accept via a control port)
>
> * Select each database in turn
>
> * Organize the tables which do not have references themselves
>
> * Send each table separately in each thread (or queue them until a
> thread is available)  ('Stage 1')
>
> * Rendezvous stage 1 completion (pause sending, wait until
> feedback from destination confirming all completed) so we have a
> known consistent state that is safe to proceed to subsequent tables
>
> * Work through tables that do refer to the previously sent in the
> same way (since the tables they reference exist and have their
> data) ('Stage 2')
>
> * Repeat progressively until all tables are done ('Stage 3', 4
> etc. as necessary)
>
> The current dumpall is essentially doing this table organization
> currently [minus stage checkpoints/multi-thread] otherwise the
> dump/load would not work.  It may even be doing a lot of this for
> 'directory' mode?  The change here is organizing n threads to
> process them concurrently where possible and coordinating the
> pipes so they only send data which can be accepted.
>
> The destination would need to have a multi-thread listen and
> co-ordinate with the sender on some control channel so feed back
> completion of each stage.
>
> Something like a destination host and control channel port to
> establish the pipes and create additional netcat pipes on
> incremental ports above the control port for each thread used.
>
> Dumpall seems like it could be a reasonable start point since it
> is already doing the complicated bits of serializing the dump data
> so it can be consistently loaded.
>
> Probably not really an admin question at this point, more a
> feature enhancement.
>
> Is there anything fundamentally wrong that someone with more
> intimate knowledge of dumpall could point out?
>
> Thanks
>
> Tom
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Ribe 2024-07-19 02:59:23 Re: filesystem full during vacuum - space recovery issues
Previous Message Ron Johnson 2024-07-18 22:41:14 Re: filesystem full during vacuum - space recovery issues

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-07-18 23:29:02 Re: Pgoutput not capturing the generated columns
Previous Message Thomas Munro 2024-07-18 22:51:51 Re: Using LibPq in TAP tests via FFI