From: | Casey <c(at)osss(dot)net> |
---|---|
To: | Christoph Berg <myon(at)debian(dot)org> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Misleading/inaccurate error message from pg_basebackup |
Date: | 2024-01-29 21:14:05 |
Message-ID: | C0560AD8-B681-46AD-8694-A210F6041A1B@osss.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I didn't believe I had to mkdir, those were just test cases to illustrate the problem in isolation. I had been trying to reinitialize a node after replacing disks used for the data volume, using Patroni. When that failed due to a pg_basebackup error, it removed the data directory.
To be clear, I'm mounting separate volumes at:
/var/lib/postgresql
/var/lib/postgresql/wal
The data and wal directories are a couple levels under those:
/var/lib/postgresql/14/main
/var/lib/postgresql/wal/14/main
So when Patroni ran into a pg_basebackup error, it removed /var/lib/postgresq/14/main and /var/lib/postgresql/14. It also does not log the specific generated pg_basebackup command. As I couldn't tell why it was erroring, I tried to recreate that command myself based on the configuration and defaults. When I did that, I didn't think about specifying the path for it as /usr/lib/postgresql/14/bin as I ought to have, but just relied on what was in my path, which turned out to be the wrapper script.
The actual problem turned out to be that I thought that I'd cleared out all the contents of the wal directory, but I'd inadvertently left a hidden file sitting in there. Anyways during the process of debugging this, I didn't have the database running, and didn't have the data directory existing. I wanted to look at pg_basebackup --help, and that would not work, throwing the error about the data directory not existing. I should have focused on the first part of the error message, that /var/lib/postgresq/14/main was not accessible, but instead I got distracted by the second part, telling me to fix the directory permissions on /var/lib/postgresql/14 making it world-readable. Well it didn't actually need to be world-readable, and we don't want it to be world-readable. Regardless, I tried making it world-readable, and was confused as to why pg_basebackup threw the same error message. Once I created the /main subdirectory, ignoring the complaint about world-readability, I was able to get a different error that pointed me to the actual problem:
pg_basebackup: error: directory "/var/lib/postgresql/wal/14/main" exists but is not empty
The point is that the error I ran into when the data directory (/main) did not exist under /var/lib/postgresq//14, is incorrect, and led me to being confused and wasting some time wondering what was wrong rather than getting to the actual problem. "please fix the directory permissions (/var/lib/postgresql/14/ should be world readable)" is misleading as there was no need to follow that instruction and it distracted from the more relevant and correct message printed just before it ("/var/lib/postgresql/14/main is not accessible"). Furthermore, pg_basebackup --help should ideally work regardless of that, as does the upstream binary.
Hope this helps,
--
Casey
> On Jan 29, 2024, at 11:40 AM, Christoph Berg <myon(at)debian(dot)org> wrote:
>
> Re: Casey
>> I thought that I addressed your inquiries as best as I was able. Can you please clarify any remaining questions?
>
> What did you do to make you believe that you had to "mkdir" in the
> first place?
>
> Also, please keep it on the list.
>
>>> On Jan 24, 2024, at 6:48 AM, Christoph Berg <myon(at)debian(dot)org> wrote:
>>>
>>> Re: Casey Shobe
>>>> Below is pasted my initial message, which gives more context and detail. Let me know if anything is still inclear after this. The context is that I use Patroni to run a multi-node cluster, and WAL-G creates a hidden directory within the wal directory which I did not initially notice when I otherwise emptied it before reinitializing a node after replacing disk for the data volume. This led to a fair bit of time wasted looking for the wrong problem:
>>>
>>> I did reply to your initially message and all the questions are still
>>> open.
>>>
>>> Christoph
>
> Christoph
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2024-01-29 21:18:37 | Re: BUG #18295: In PostgreSQL a unique index on targeted columns is sufficient to support a foreign key |
Previous Message | Christoph Berg | 2024-01-29 17:40:21 | Re: Misleading/inaccurate error message from pg_basebackup |