Re: LISTEN denial of service with aborted transaction

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matt Newell <newellm(at)blur(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, it(at)blur(dot)com
Subject: Re: LISTEN denial of service with aborted transaction
Date: 2015-09-30 01:58:35
Message-ID: 15532.1443578315@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Matt Newell <newellm(at)blur(dot)com> writes:
> On Monday, September 28, 2015 07:22:26 PM Tom Lane wrote:
>> Possibly. sinval catchup notification works like that, so you could maybe
>> invent a similar mechanism for notify. Beware that we've had to fix
>> performance issues with sinval catchup; you don't want to release a whole
>> bunch of processes to do work at the same time.

> I'll have to think about this more but i don't envision it causing such as
> scenario.
> The extra processing that will happen at signaling time would have been done
> anyway when the aborted transaction is finally rolled back, the only extra
> work is copying the relevant notifications to local memory.

Hm. An idle-in-transaction listening backend will eventually cause a more
serious form of denial of service: once the pg_notify SLRU area fills up
to its maximum of 8GB, no more new messages can be inserted, and every
transaction that tries to do a NOTIFY will abort. However, that's a
documented hazard and we've not really had complaints about it. In any
case, trying to fix that by having such a backend absorb messages into
local memory doesn't seem like a great idea: if it sits for long enough,
its local memory usage will bloat. Eventually you'd have a failure
anyway.

> Regardless it might not be worthwhile since my patch for #1 seems to address
> the symptom that bit me.

I looked at this patch for a bit and found it kind of ugly, particularly
the business about "we allow one process at a time to advance
firstUncommitted by using advancingFirstUncommitted as a mutex".
That doesn't sound terribly safe or performant.

After further thought, I wonder whether instead of having an incoming
listener initialize its "pos" to QUEUE_TAIL, we couldn't have it
initialize to the maximum "pos" among existing listeners (with a floor of
QUEUE_TAIL of course). If any other listener has advanced over a given
message X, then X was committed (or aborted) earlier and the new listener
is not required to return it; so it should be all right to take that
listener's "pos" as a starting point.

The minimum-code-change way to do that would be to compute the max pos
the hard way while Exec_ListenPreCommit holds the AsyncQueueLock, which
it would now have to do in exclusive mode. That's a little bit annoying
but it's surely not much worse than what SignalBackends has to do after
every notification.

Alternatively, we could try to maintain a shared pointer that is always
equal to the frontmost backend's "pos". But I think that would require
more locking during queue reading than exists now; so it's far from clear
it would be a win, especially in systems where listening sessions last for
a reasonable amount of time (so that Exec_ListenPreCommit is infrequent).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-09-30 02:02:38 Re: [PATCH 5/6] pg_basebackup: allow GetConnection() to make non-replication connections.
Previous Message Kyotaro HORIGUCHI 2015-09-30 01:54:40 Re: PATCH: index-only scans with partial indexes