Re: proposal: make NOTIFY list de-duplication optional

From: Filip Rembiałkowski <filip(dot)rembialkowski(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: make NOTIFY list de-duplication optional
Date: 2016-02-06 19:15:51
Message-ID: CAP_rww=n3sPMGeKh4ERb4BpC46uDC-SMkFyMZMUQfb0TTwZwgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 6, 2016 at 5:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Brendan Jurd <direvus(at)gmail(dot)com> writes:
>> On Sat, 6 Feb 2016 at 12:50 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah, I agree that a GUC for this is quite unappetizing.
>
>> How would you feel about a variant for calling NOTIFY?
>
> If we decide that this ought to be user-visible, then an extra NOTIFY
> parameter would be the way to do it. I'd much rather it "just works"
> though. In particular, if we do start advertising user control of
> de-duplication, we are likely to start getting bug reports about every
> case where it's inexact, eg the no-checks-across-subxact-boundaries
> business.

It is not enough to say "database server can decide to deliver a
single notification only." - which is already said in the docs?

The ALL keyword would be a clearly separated "do-nothing" version.

>
>> Optimising the remove-duplicates path is still probably a worthwhile
>> endeavour, but if the user really doesn't care at all about duplication, it
>> seems silly to force them to pay any performance price for a behaviour they
>> didn't want, no?
>
> I would only be impressed with that argument if it could be shown that
> de-duplication was a significant fraction of the total cost of a typical
> NOTIFY cycle.

Even if a typical NOTIFY cycle excludes processing 10k or 100k
messages, why penalize users who have bigger transactions?

> Obviously, you can make the O(N^2) term dominate if you
> try, but I really doubt that it's significant for reasonable numbers of
> notify events per transaction.

Yes, it is hard to observe for less than few thousands messages in one
transaction.
But big data happens. And then the numbers get really bad.
In my test for 40k messages, it is 400 ms versus 9 seconds. 22 times
slower. For 200k messages, it is 2 seconds versus 250 seconds. 125
times slower.
And I tested with very short payload strings, so strcmp() had not much to do.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-02-06 19:34:07 Re: Explanation for bug #13908: hash joins are badly broken
Previous Message Shubham Barai 2016-02-06 17:56:19 Optimization- Check the set of conditionals on a WHERE clause against CHECK constraints.