From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Amit Kapila <akapila(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Add tests for '-f' option in dropdb utility. |
Date: | 2019-11-29 09:54:12 |
Message-ID: | CAA4eK1+GNyjaPK77y+euh5eAgM75pncG1JYZhxYZF+SgS6NpjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
On Fri, Nov 29, 2019 at 7:42 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Nov 29, 2019 at 2:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
>
> > * On all the failing machines, it's very clear from the postmaster
> > log that the backend knows why it's being terminated:
> >
> > 2019-11-28 13:47:56.320 UTC [212024:4] 051_dropdb_force.pl FATAL: terminating connection due to administrator command
> >
>
> Yeah, I can confirm this behavior on the Windows machines. I have
> also seen that we already expose this behavior in more than one way
> and all behave similar to this. If I use pg_terminate_backend(<pid>)
> or pg_ctl kill TERM <pid>, the behavior is exactly the same
> (terminated backend doesn't get the message (FATAL: terminating
> connection due to administrator command), but it is present in
> postmaster log.
>
> > So the question seems to be why libpq isn't reporting that
> > message before it detects connection-closed.
> >
> > This triggered a vague memory, and after a bit of archives-digging,
> > I found this thread from a few months back:
> >
> > https://www.postgresql.org/message-id/flat/CA%2BhUKGJowQypXSKsjws9A%2BnEQDD0-mExHZqFXtJ09N209rCO5A%40mail.gmail.com#0629f079bc59ecdaa0d6ac9f8f2c18ac
> >
> > in which it's alleged that Windows' TCP stack is flat-out
> > broken and will drop not-yet-delivered data when the server
> > closes the connection.
> >
> > If that's true, it's pretty nasty. Windows is about the
> > last platform where I'd want us to have behavior like this,
> > because we *will* get bug reports about it from novices.
> >
> > If there's no other workaround, I'm tempted to propose
> >
> > #ifdef WIN32
> > pg_sleep(1 second);
> > #endif
> >
> > or something close to that, before we close the socket.
> >
>
> I can experiment with this or if something else occurred to me.
>
The above experiment suggested by you works and the test passes after
this in the local Windows setup. One more thing, I think we don't
need to do above for graceful exits and during socket_close we have
access to exit_status_code which can be used in this context. I have
attached the patch for this. I think something on these lines is even
suggested by MSDN [1], but I am not sure.
I have also tried calling closesocket() explicitly in our function
socket_close which has changed the error message to "could not receive
data from server: Software caused connection abort
(0x00002745/10053)". I am not sure if it is any better. Another
thing I have tried to use is SO_LINGER option of socket but that
didn't work out.
> > Or we could revert the whole feature.
> >
>
> Yeah, that is also one possibility, but I think given we already have
> this behavior in existing features, it is better to either come up
> with some solution or maybe mention in docs that in such cases users
> need to check postmaster log to know the actual reason.
>
> I think we can further explore this, but for now, we might want to (a)
> revert this test, or (b) change the expected output to match.
>
I think if we decide to go with the above solution for Windows, then
we can keep the current test as it is or probably remove one test
related to <pid> test, otherwise, it is better to go with (a) for now
and once we decide any solution for this we can again commit these
tests.
[1] - https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket
See below note in the above link:
Using the closesocket or shutdown functions with SD_SEND or SD_BOTH
results in a RELEASE signal being sent out on the control channel. Due
to ATM's use of separate signal and data channels, it is possible that
a RELEASE signal could reach the remote end before the last of the
data reaches its destination, resulting in a loss of that data. One
possible solutions is programming a sufficient delay between the last
data sent and the closesocket or shutdown function calls for an ATM
socket.
Half close is not supported by ATM.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
sleep_before_socket_close_1.patch | application/octet-stream | 729 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2019-11-29 09:56:07 | pgsql: Small code simplification |
Previous Message | Peter Eisentraut | 2019-11-29 09:24:47 | pgsql: Add a regression test for allow_system_table_mods |