From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallelizing the archiver |
Date: | 2021-09-10 06:03:46 |
Message-ID: | BC335D75-105B-403F-9473-976C8BBC32E3@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> 10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123(at)gmail(dot)com> написал(а):
>
> On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>> It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable archiving at all.
>> Maybe just add parallelism API for external tool?
>> It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a tremendously harder than spawning goroutine, thread, task or whatever.
>
> Yes, but it also means that it's up to every single archiving tool to
> implement a somewhat hackish parallel version of an archive_command,
> hoping that core won't break it.
I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
> If this problem is solved in
> postgres core whithout API change, then all existing tool will
> automatically benefit from it (maybe not the one who used to have
> hacks to make it parallel though, but it seems easier to disable it
> rather than implement it).
True hacky tools already can coordinate swarm of their processes and are prepared that they are called multiple times concurrently :)
>> External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure that external archiever\restorer is running.
>> For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such tool swiftly :)
>> E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of archive_status and hacky algorithms to predict next WAL name and completition time!
>
> Yes, but that requires fundamental design changes for the archive
> commands right? So while I agree it could be a better approach
> overall, it seems like a longer term option. As far as I understand,
> what Nathan suggested seems more likely to be achieved in pg15 and
> could benefit from a larger set of backup solutions. This can give us
> enough time to properly design a better approach for designing a new
> archiving approach.
It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command. What fundamental design changes we need?
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2021-09-10 06:11:57 | Re: parallelizing the archiver |
Previous Message | Julien Rouhaud | 2021-09-10 05:52:02 | Re: parallelizing the archiver |