odd intermittent query hanging issue

From: Aaron Burnett <aburnett(at)bzzagent(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: odd intermittent query hanging issue
Date: 2012-05-18 16:17:34
Message-ID: CBDBD23E.163AB%aburnett@bzzagent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Greetings,

I run a handful of queries overnight when traffic is at it's lowest on our
system. One particular query will run perfectly fine (around 5 seconds)
for several weeks, then suddenly decide to hang indefinitely and never
finish. It needs to be killed manually after several hours (I've
intentionally let it run to see if it would ever finish) in order for it
to die.

The fix _seems to be_ to drop and rebuild the index on
xrefchannelmember.member_id. The query then goes back to running in the 5
seconds and has no problem again for weeks.... until it happens again.

Has anyone heard of such a thing? And if anyone can maybe point me in
directions to investigate, it would be much appreciated.

Postgresl v 9.1.3
Ubuntu v 11.10
Linux db 3.0.0-16-generic #29-Ubuntu SMP Tue Feb 14 12:48:51 UTC 2012
x86_64 x86_64 x86_64 GNU/Linux

Query, explain plan (when ot works) and table structure below:

(Thanking you in advance for any help)

select distinct(id) from member left join xrefchannelmember xrcm on
id=xrcm.member_id where id not in (Select memberid from
xrefcampaignmember) and xrcm.channel_id in (1) order by id asc;

explain plan:
------------------------------
Unique (cost=237234.66..239267.33 rows=406533 width=4) (actual
time=4790.823..4922.621 rows=418843 loops=1)
-> Sort (cost=237234.66..238251.00 rows=406533 width=4) (actual
time=4790.818..4826.201 rows=418879 loops=1)
Sort Key: member.id
Sort Method: quicksort Memory: 31923kB
-> Hash Join (cost=167890.31..199359.99 rows=406533 width=4)
(actual time=3357.406..4532.952 rows=418879 loops=1)
Hash Cond: (xrcm.member_id = member.id)
-> Seq Scan on xrefchannelmember xrcm
(cost=0.00..19273.69 rows=813066 width=4) (actual time=0.015..219.259 row
s=814421 loops=1)
Filter: (channel_id = 1)
-> Hash (cost=162586.21..162586.21 rows=424328 width=4)
(actual time=3357.001..3357.001 rows=444626 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 15632kB
-> Seq Scan on member (cost=66114.02..162586.21
rows=424328 width=4) (actual time=2357.280..3216.076 rows
=444626 loops=1)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on xrefcampaignmember
(cost=0.00..57931.82 rows=3272882 width=4) (actual time=0.021..
670.086 rows=3272870 loops=1)
Total runtime: 4963.134 ms

Table structure:

Table "public.xrefcampaignmember"
Column | Type | Modifiers

--------------+--------------------------+---------------------------------
-------------------
campaignid | integer | not null
memberid | integer | not null
joined | timestamp with time zone | default
('now'::text)::timestamp without time zone
reservedslot | integer | default 0
Indexes:
"XrefCampaignMember_pkey" PRIMARY KEY, btree (campaignid, memberid)
"xcm_campaignid_idx" btree (campaignid)
"xcm_joined_idx" btree (joined)
"xcm_memberid_idx" btree (memberid)
"xcm_reservedslot_idx" btree (reservedslot)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-05-18 16:29:25 Re: difference in query plan when db is restored
Previous Message Scott Marlowe 2012-05-18 15:41:04 Re: How to check for server availability? [v9.3.1, Linux]