From: | Nitin Motiani <nitinmotiani(at)google(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Hannu Krosing <hannuk(at)google(dot)com> |
Subject: | Adding pg_dump flag for parallel export to pipes |
Date: | 2025-04-07 17:16:58 |
Message-ID: | CAH5HC97p4kkpikar+swuC0Lx4YTVkE30sTsFX94tyzih7Cc_=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers,
We are proposing the ability to specify a pipe command to pg_dump by a
flag. And attaching the patch set.
Why : Currently it is quite simple to pipe the output of pg_dump for
text format to a pipe at command line and do any manipulations
necessary. Following is an example :
pg_dump <flags> <dbname> | lz4 | pv -L 10k | ssh remote.host
"cat - > remote.dump.lz4"
Here we first compress the stream using lz4 and then send it over ssh
to a remote host to be saved as a file while rate-limiting the network
usage to 10KB/s.
Something like this is not possible for format=directory (-Fd) since
all you can provide is the directory name to store the individual
files. Note it is not possible to do this irrespective of the usage of
the parallel dump option ('--jobs' flag).
While the directory format supports compression using a flag, the rest
of the operations in the above example are not possible. And a pipe
command provides more flexibility in what compression algorithm one
wants to use.
This patch set provides pg_dump the ability to pipe the data in the
directory mode by using a new flag '--pipe-command' (in both parallel
and non-parallel mode).
We also add a similar option to pg_restore.
The following can be the major use cases of these changes :
1. Stream pg_dump output to a cloud storage
2. SSH the data to a remote host (with or without throttling)
3. Custom compression options
Usage Examples : Here is an example of how the pipe-command will look like.
pg_dump -Fd mydb --pipe-command="cat > dumpdir/%f" (dumpdir
should exist beforehand.)
This is equivalent to
pg_dump -Fd mydb --file=dumpdir
(Please note that the flags '--file' or '--pipe-command' can't be used
together.)
For the more complex scenario as mentioned above, the command will be
(with the parallelism of 5) :
pg_dump -Fd mydb -j 5 --pipe-command="lz4 | pv -L 10k | ssh
remote.host "cat > dumpdir/%f""
Please note the use of %f in the above examples. As a user would
almost always want to write the post-processing output to a file (or
perhaps a cloud location), we provide a format specifier %f in the
command. The implementation of pipe-command replaces these format
specifiers with the corresponding file names. These file names are the
same as they would be in the current usage of directory format with
'--file' flag (<dump_id>.dat, toc.dat, blob_NNN.toc,
blob_<blob_id>.dat).
The usage of this flag with pg_restore will also be similar. Here is
an example of restoring from a gzip compressed dump directory.
pg_restore -C -Fd -d postgres --pipe-commnad="cat
dumpdir/%f.gz | gunzip"
The new flag in pg_restore also works with '-l' and '-L' options
pg_restore -C -Fd -d postgres --pipe-commnad="cat dumpdir/%f" -L db.list
Implementation Details : Here are the major changes :
1. We reuse the same variables which store the file name to store
the pipe command. And add a new bool fSpecIsPipe in _archiveHandle
(similar bools in pg_dump.c and pg_restore.c) to specify if it's a
pipe command.
2. In the cases when the above bool is set to true, we use popen
and pclose instead of fopen and fclose.
3. To enable the format specifier %f in the pipe-command, we make
changes to the file name creation logic in a few places. Currently the
file name (corresponding to a table or large object) is appended to
the directory name provided by '--file' command. In case of
'--pipe-command', we use 'replace_percent_placeholders' to replace %f
with the corresponding file name. This change is made for both table
files and LO TOC files.
With these core changes, the rest of the code continues working as-is.
We are attaching 4 patches for this change :
001-pg_dump_pipe has the pg_dump pipe support code.
002-pg_restore_pipe has the pg_restore pipe support.
003-pg_dump_basic_tests has a few basic validation tests for
correctmflag combinations. We need to write more automated tests in
002_pg_dump.pl but have been running into some issues with environment
setup due to which certain pipe commands result in the shell process
becoming defunct. These same commands are working fine in manual
testing. We are still looking into this.
004-pg_dump_documentation has the proposed documentation changes.
We are working on the above test issues and cleanup of the patches.
Open Questions : There are a couple of open questions in the implementation :
1. Currently the LO TOC file (blob_NNN.toc) is opened in the
append mode. This is not possible with popen for the pipe command.
From reading the code, it seems to us that this file doesn't need to
be opened in the append mode. As '_StartLOs' is called once per
archive entry in WriteDataChunksForToCEntry followed by the dumper
function and then '_EndLOs', it should be okay to change this to 'w'
mode. But this code has been there since the start so we haven't made
that change yet. In the patch, we have changed it to 'w' pipe-command
only and added the ideas for potential solutions in the comments.
2. We are also not sure yet on how to handle the environment
issues when trying to add new tests to 002_pg_dump.pl.
Please let us know what you think.
Thanks & Regards,
Nitin Motiani
Google
Attachment | Content-Type | Size |
---|---|---|
002-pg_restore_pipe_v4.patch | application/octet-stream | 7.9 KB |
001-pg_dump_pipe_v4.patch | application/octet-stream | 31.1 KB |
004-pg_dump_documentation_v4.patch | application/octet-stream | 7.1 KB |
003-pg_dump_basic_tests_v4.patch | application/octet-stream | 2.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-04-07 17:20:21 | pgsql: Allow NOT NULL constraints to be added as NOT VALID |
Previous Message | Tomas Vondra | 2025-04-07 17:14:21 | Re: Improve monitoring of shared memory allocations |