Re: Would it be possible to have parallel archiving?

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: David Steele <david(at)pgmasters(dot)net>, hubert depesz lubaczewski <depesz(at)depesz(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Would it be possible to have parallel archiving?
Date: 2018-08-28 20:34:34
Message-ID: D8EBD385-D0D8-4997-BD2D-DD2B99248B39@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 28 авг. 2018 г., в 17:07, Stephen Frost <sfrost(at)snowman(dot)net> написал(а):
>
> Greetings,
>
> * Andrey Borodin (x4mmm(at)yandex-team(dot)ru) wrote:
>>> 28 авг. 2018 г., в 14:08, Stephen Frost <sfrost(at)snowman(dot)net> написал(а):
>>> * David Steele (david(at)pgmasters(dot)net <mailto:david(at)pgmasters(dot)net>) wrote:
>>>> On 8/28/18 8:32 AM, Stephen Frost wrote:
>>>> To be clear, pgBackRest uses the .ready files in archive_status to
>>>> parallelize archiving but still notifies PostgreSQL of completion via
>>>> the archive_command mechanism. We do not modify .ready files to .done
>>>> directly.
>>>
>>> Right, we don't recommend mucking around with that directory of files.
>>> Even if that works today (which you'd need to test extensively...),
>>> there's no guarantee that it'll work and do what you want in the
>>> future...
>> WAL-G modifies archive_status files.
>
> Frankly, I've heard far too many concerns and issues with WAL-G to
> consider anything it does at all sensible.
Umm.. very interesting. What kind of issues? There are few on github repo, all of them will be addressed. Do you have some other reports? Can you share it?
I'm open to discuss any concerns.

>
>> This path was chosen to limit state preserved between WAL-G runs (archiving to S3) and further push archiving performance.
>
> I still don't think it's a good idea and I specifically recommend
> against making changes to the archive status files- those are clearly
> owned and managed by PG and should not be whacked around by external
> processes.
If you do not write to archive_status, you basically have two options:
1. On every archive_command recheck that archived file is identical to file that is already archived. This hurts performance.
2. Hope that files match. This does not add any safety compared to whacking archive_status. This approach is prone to core changes as writes are.

Well, PostgreSQL clearly have the problem which can be solved by good parallel archiving API. Anything else - is whacking around, just reading archive_status is nothing better that reading and writing.

>
>> Indeed, it was very hard to test. Also, this makes impossible to use two archiving system simultaneously for transit period.
>
> The testing in WAL-G seems to be rather lacking from what I've seen.
Indeed, WAL-G still lacks automatic integration tests, I hope that some dockerized tests will be added soon.
By now I'm doing automated QA in Yandex infrastructure.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-08-28 20:41:28 Re: Would it be possible to have parallel archiving?
Previous Message Stephen Frost 2018-08-28 20:07:54 Re: Would it be possible to have parallel archiving?