From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | dilipbalaut(at)gmail(dot)com |
Cc: | robertmhaas(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Race condition in recovery? |
Date: | 2021-06-04 09:24:56 |
Message-ID: | 20210604.182456.2154274967017506645.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Fri, 4 Jun 2021 13:21:08 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Fri, Jun 4, 2021 at 2:03 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, May 27, 2021 at 2:26 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > Changed as suggested.
> >
> > I don't think the code as written here is going to work on Windows,
> > because your code doesn't duplicate enable_restoring's call to
> > perl2host or its backslash-escaping logic. It would really be better
> > if we could use enable_restoring directly. Also, I discovered that the
> > 'return' in cp_history_files should really say 'exit', because
> > otherwise it generates a complaint every time it's run. It should also
> > have 'use strict' and 'use warnings' at the top.
>
> Ok
>
> > Here's a version of your test case patch with the 1-line code fix
> > added, the above issues addressed, and a bunch of cosmetic tweaks.
> > Unfortunately, it doesn't pass for me consistently. I'm not sure if
> > that's because I broke something with my changes, or because the test
> > contains an underlying race condition which we need to address.
> > Attached also are the log files from a failed run if you want to look
> > at them. The key lines seem to be:
>
> I could not reproduce this but I think I got the issue, I think I used
> the wrong target LSN in wait_for_catchup, instead of checking the last
> "insert LSN" of the standby I was waiting for last "replay LSN" of
> standby which was wrong. Changed as below in the attached patch.
I think that's right. And the test script detects the issue for me
both on Linux but doesn't work for Windows.
'"C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files"' is not recognized as an internal command or external command ..
Because Windows' cmd.exe doesn't have the shbang feature. On Windows,
maybe archive_command should be like
'".../perl" "$FindBin../cp_history_files" "%p"...
If I did this I got another error.
"couldn't copy pg_wal\00000002.history to C:/../Documents/work/postgresql/src est^Mecovery/tmp_check/t_000_a_primary_data/archives/00000002.history: at C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files line 10.^M"
("^M" are the replacement for carrage return)
So.. I'm not sure what is happening but the error messages, or..
Anyway I don't have a time to investigate it.
+ # clean up
+ $node_primary->teardown_node;
+ $node_standby->teardown_node;
+ $node_cascade->teardown_node;
I don't think explicit teardown is useless as the final cleanup.
By the way the attached patch is named as "Fix-corner-case..." but
doesn't contain the fix. Is it intentional?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2021-06-04 09:45:59 | Re: security_definer_search_path GUC |
Previous Message | Joel Jacobson | 2021-06-04 09:17:20 | Re: security_definer_search_path GUC |