Re: parallelizing the archiver

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-09-10 05:52:02
Message-ID: CAOBaU_Ybyu3ror1UhoZP8hemnX-eA-kGtFa_ez+kRm4xdedEzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
> Maybe just add parallelism API for external tool?
> It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it. If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).

> External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
> For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
> E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!

Yes, but that requires fundamental design changes for the archive
commands right? So while I agree it could be a better approach
overall, it seems like a longer term option. As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions. This can give us
enough time to properly design a better approach for designing a new
archiving approach.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-09-10 06:03:46 Re: parallelizing the archiver
Previous Message houzj.fnst@fujitsu.com 2021-09-10 05:51:18 RE: Added schema level support for publication.