From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | François Beaulieu <Francois(dot)Beaulieu(at)sbktelecom(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Potential bug with pg_notify |
Date: | 2017-02-13 18:56:41 |
Message-ID: | b45b8b2c-b15e-6847-8014-613c323d2961@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 02/13/2017 09:04 AM, François Beaulieu wrote:
>
>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>>
|
>>>
>>>> 3) Are the first row and the second row in the same partition?
>>>
>>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above example. What would be the best way to determine which child a row is in, and the relative position in the child table?
>>
>> As to position, maybe ctid though it has caveats:
>
> The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table. So, probably not at a partition boundary.
>
>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the cause of the bug. Any suggestions on how I might go about doing that efficiently?
>>
>> What does the worker do?
>
> Sorry, that's my employer’s classified IP. :-)
> Does it matter?
Only that it makes it harder to give any suggestions on eliminating it
as a source of error if it is a black box. I don't think, at this
point, it is necessary to see the actual source. If it is possible a
high level synopsis of what it does might be sufficient.
>
>> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to manifest a problem only around the 7th day boundary. So put this down to thinking aloud.
>
> No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours in the middle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without timezone assumes the local timezone of the server, which is set to UTC anyway.
>
> Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think that would normally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would indicate that, as it should, it’s done generating before my trigger is called.
I don't see anything that would explain a delay. Still the fact remains
that in most cases the notify captures the _id, but in some cases it
does not. Going back to your OP I realized I missed that the
NEW.userfield was also not coming through. So that seems to confirm that
pg_notify() is firing before it gets access to NEW.*. Having said that I
have no idea why?
The only thing I can think to do is(untested):
CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
DECLARE
_newid integer;
BEGIN
SELECT NEW._id INTO _newid;
IF _newid IS NULL OR NOT FOUND THEN
RAISE NOTICE 'NEW._id is NULL/NOT FOUND';
pg_sleep(0.1); --Or whatever interval you want.
END IF;
PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ','
|| NEW.userfield);
RETURN new;
END;
$$ LANGUAGE plpgsql;
Not really a solution but it might help determine whether it is a timing
issue. Also this is probably something that should be done on a test
server to be safe.
>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2017-02-13 19:26:17 | Re: Bad planning data resulting in OOM killing of postgres |
Previous Message | Melvin Davidson | 2017-02-13 18:26:12 | Re: Auto-Rollback option |