Re: Query cancel seems to be broken in master since Oct 17

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query cancel seems to be broken in master since Oct 17
Date: 2016-10-18 14:03:39
Message-ID: 21605.1476799419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 10/18/2016 04:13 PM, Tom Lane wrote:
>> There's a smoking gun in the postmaster log:
>> 2016-10-18 09:10:34.547 EDT [18502] LOG: wrong key in cancel request for process 18491

> Ok, I've reverted that commit for now. It clearly needs more thought,
> because of this, and the pademelon failure discussed on the other thread.

I think that was an overreaction. The problem is pretty obvious after
adding some instrumentation:

2016-10-18 09:57:47.508 EDT [21229] LOG: wrong key (0x7B7E4D5E, expected 0xF0F804017B7E4D5E) in cancel request for process 21228

To wit, the various cancel_key backend variables are declared as "long",
and the new code

if (!pg_strong_random(&MyCancelKey, sizeof(MyCancelKey)))

is therefore computing an 8-byte random value on 64-bit-long machines.
But only 4 bytes go to the client and come back.

The cleanest fix might be to change those various "long" variables
to uint32. You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2016-10-18 14:26:37 Re: Move pg_largeobject to a different tablespace *without* turning on system_table_mods.
Previous Message Merlin Moncure 2016-10-18 13:45:55 Re: emergency outage requiring database restart