Re: Misleading/inaccurate error message from pg_basebackup

From: Casey <c(at)osss(dot)net>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Misleading/inaccurate error message from pg_basebackup
Date: 2024-01-29 21:14:05
Message-ID: C0560AD8-B681-46AD-8694-A210F6041A1B@osss.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I didn't believe I had to mkdir, those were just test cases to illustrate the problem in isolation. I had been trying to reinitialize a node after replacing disks used for the data volume, using Patroni. When that failed due to a pg_basebackup error, it removed the data directory.

To be clear, I'm mounting separate volumes at:
/var/lib/postgresql
/var/lib/postgresql/wal

The data and wal directories are a couple levels under those:
/var/lib/postgresql/14/main
/var/lib/postgresql/wal/14/main

So when Patroni ran into a pg_basebackup error, it removed /var/lib/postgresq/14/main and /var/lib/postgresql/14. It also does not log the specific generated pg_basebackup command. As I couldn't tell why it was erroring, I tried to recreate that command myself based on the configuration and defaults. When I did that, I didn't think about specifying the path for it as /usr/lib/postgresql/14/bin as I ought to have, but just relied on what was in my path, which turned out to be the wrapper script.

The actual problem turned out to be that I thought that I'd cleared out all the contents of the wal directory, but I'd inadvertently left a hidden file sitting in there. Anyways during the process of debugging this, I didn't have the database running, and didn't have the data directory existing. I wanted to look at pg_basebackup --help, and that would not work, throwing the error about the data directory not existing. I should have focused on the first part of the error message, that /var/lib/postgresq/14/main was not accessible, but instead I got distracted by the second part, telling me to fix the directory permissions on /var/lib/postgresql/14 making it world-readable. Well it didn't actually need to be world-readable, and we don't want it to be world-readable. Regardless, I tried making it world-readable, and was confused as to why pg_basebackup threw the same error message. Once I created the /main subdirectory, ignoring the complaint about world-readability, I was able to get a different error that pointed me to the actual problem:
pg_basebackup: error: directory "/var/lib/postgresql/wal/14/main" exists but is not empty

The point is that the error I ran into when the data directory (/main) did not exist under /var/lib/postgresq//14, is incorrect, and led me to being confused and wasting some time wondering what was wrong rather than getting to the actual problem. "please fix the directory permissions (/var/lib/postgresql/14/ should be world readable)" is misleading as there was no need to follow that instruction and it distracted from the more relevant and correct message printed just before it ("/var/lib/postgresql/14/main is not accessible"). Furthermore, pg_basebackup --help should ideally work regardless of that, as does the upstream binary.

Hope this helps,
--
Casey

> On Jan 29, 2024, at 11:40 AM, Christoph Berg <myon(at)debian(dot)org> wrote:
>
> Re: Casey
>> I thought that I addressed your inquiries as best as I was able. Can you please clarify any remaining questions?
>
> What did you do to make you believe that you had to "mkdir" in the
> first place?
>
> Also, please keep it on the list.
>
>>> On Jan 24, 2024, at 6:48 AM, Christoph Berg <myon(at)debian(dot)org> wrote:
>>>
>>> Re: Casey Shobe
>>>> Below is pasted my initial message, which gives more context and detail. Let me know if anything is still inclear after this. The context is that I use Patroni to run a multi-node cluster, and WAL-G creates a hidden directory within the wal directory which I did not initially notice when I otherwise emptied it before reinitializing a node after replacing disk for the data volume. This led to a fair bit of time wasted looking for the wrong problem:
>>>
>>> I did reply to your initially message and all the questions are still
>>> open.
>>>
>>> Christoph
>
> Christoph

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2024-01-29 21:18:37 Re: BUG #18295: In PostgreSQL a unique index on targeted columns is sufficient to support a foreign key
Previous Message Christoph Berg 2024-01-29 17:40:21 Re: Misleading/inaccurate error message from pg_basebackup