From: | Jacob Champion <pchampion(at)vmware(dot)com> |
---|---|
To: | "rjuju123(at)gmail(dot)com" <rjuju123(at)gmail(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com> |
Cc: | "bossartn(at)amazon(dot)com" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallelizing the archiver |
Date: | 2021-09-10 17:07:01 |
Message-ID: | 78728f8c5e413c05e00426369f79780a35caef5c.camel@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance. But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed. It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like. In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.
Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)
For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?
> > That is possibly true. I think it might work to just assume that you
> > have to retry everything if it exits non-zero, but that requires the
> > archive command to be smart enough to do something sensible if an
> > identical file is already present in the archive.
>
> Yes, it could be. I think that we need more feedback for that too.
Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?
--Jacob
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2021-09-10 17:09:51 | Re: parallelizing the archiver |
Previous Message | Bossart, Nathan | 2021-09-10 17:06:59 | Re: parallelizing the archiver |