Re: refactoring basebackup.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: refactoring basebackup.c
Date: 2021-11-08 16:34:50
Message-ID: CA+TgmoZbNzsWwM4BE5Jb_qHncY817DYZwGf+2-7hkMQ27ZwsMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 8, 2021 at 10:59 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > It turns out that these commits are causing failures on prairiedog.
> > Per email from Tom off-list, that's apparently because prairiedog has
> > a fussy version of tar that doesn't like it when you omit the trailing
> > NUL blocks that are supposed to be part of a tar file.
>
> FTR, prairiedog is green. It's Noah's AIX menagerie that's complaining.

Woops.

> It's actually a little bit disturbing that we're only seeing a failure
> on that one platform, because that means that nothing else is anchoring
> us to the strict POSIX specification for tarfile format. We knew that
> GNU tar is forgiving about missing trailing zero blocks, but apparently
> so is BSD tar.

Yeah.

> One part of me wants to add some explicit test for the trailing blocks.
> Another says, well, the *de facto* tar standard seems not to require
> the trailing blocks, never mind the letter of POSIX --- so when AIX
> dies, will anyone care anymore? Maybe not.

FWIW, I think both of those are pretty defensible positions. Honestly,
I'm not sure how likely the bug is to recur once we fix it here,
either. The only reason this is a problem is because of the kludge of
having the server generate the entire output file except for the last
1kB. If we eliminate that behavior I don't know that this particular
problem is especially likely to come back. But adding a test isn't
stupid either, just a bit tricky to write. When I was testing locally
this morning I found that there were considerably more than 1024 zero
bytes at the end of the file because the last file it backs up is
pg_control which ends with lots of zero bytes. So it's not sufficient
to just write a test that checks for non-zero bytes in the last 1kB of
the file. What I think you'd need to do is figure out the number of
files in the archive and the sizes of each one, and based on that work
out how big the tar archive should be: 512 bytes per file or directory
or symlink plus enough extra 512 byte chunks to cover the contents of
each file plus an extra 1024 bytes at the end. That doesn't seem
particularly simple to code. We could run 'tar tvf' and parse the
output to get the number of files and their lengths, but that seems
likely to cause more portability headaches than the underlying issue.
Since pg_basebackup now has the logic to do all of this parsing
internally, we could make it complain if it receives from a v15+
server an archive trailer that is not 1024 bytes of zeroes, but that
wouldn't help with this exact problem, because the issue in this case
is when pg_basebackup decides it doesn't need to parse in the first
place. We could add a pg_basebackup option
--force-parsing-and-check-if-the-server-seems-broken, but that seems
like overkill to me. So overall I'm inclined to just do nothing about
this unless someone has a better idea how to write a reasonable test.

Anyway, here's my proposal for fixing the issue immediately before us.
0001 adds logic to pad out the unterminated tar archives, and 0002
makes the server terminate its tar archives while preserving the logic
added by 0001 for cases where we're talking to an older server. I
assume that it's best to get something committed quickly here so will
do that in ~4 hours if there are no major objections, or sooner if I
hear some enthusiastic endorsement.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Minimal-fix-for-unterminated-tar-archive-problem.patch application/octet-stream 5.1 KB
0002-Have-the-server-properly-terminate-tar-archives.patch application/octet-stream 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-11-08 17:05:24 Re: Missing include <openssl/x509.h> in be-secure-openssl.c?
Previous Message Michel Pelletier 2021-11-08 16:12:52 Re: Proposal for adding an example "expanded" data type to contrib