Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

From: Mitu Verma <mitu(dot)verma(at)ericsson(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'pgsql-bugs(at)postgresql(dot)org'" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.
Date: 2014-07-11 08:03:03
Message-ID: 84BC7AB0D621A74893EC9C9E151293B003F904C2@ESESSMB207.ericsson.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi tom,

I have few more observations regarding the issue mentioned below -

Few points
1. Database dump has been taken from the customer site from where the issue has been reported, we are using postgreSQL 9.1.3

2.Schema of the table is -
Column | Type | Modifiers
-------------+-----------------------+-----------
tableindex | integer |
object | character varying(80) |
method | character varying(80) |
bgwuser | character varying(80) |
time | character(23) |
realuser | character varying(80) |
host | character varying(80) |
application | character varying(80) |
Indexes:
"ind1_eventlogentry" UNIQUE, btree (tableindex), tablespace "mmindex"
Tablespace: "mmdata"

3. On the database which we have taken from customer site when we were running the following query , query is hung.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES (E'Server', E'Start', E'bgw', E'20140512122404', NULL, NULL, NULL, 539 );

4. But when we used index after 586 like, then the insert is happening correctly on the same database.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES (E'Server', E'Start', E'bgw', E'20140512122404', NULL, NULL, NULL, 587 );

5. When we did debugging , then we found that postgreSQL has stored some other transaction id and it asks for the current transaction to wait till the time previous transaction
Either gets committed or rolled back.

6. It seems that somehow the indexes are corrupted and postgreSQL has stored some transaction id somewhere, and so it is not allowing the current transaction to happen
For that particular index range.

7. When the index has been dropped and table has been recreated then everything works fine.

Plz help me with this issue by providing answers to the following queries

A) why postgreSQL is not allowing the insertion to happen for a particular index range and allowing it after that particular range?
B) where does the post greSQL stores the transaction ids and why it is not getting refreshed ?

Regards
Mitu
-----Original Message-----
From: Mitu Verma
Sent: Wednesday, July 09, 2014 12:31 PM
To: 'Tom Lane'
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: RE: [BUGS] BUG #10888: application is getting hanged in the poll() function of libpq.so.

Hi Tom,

I have reproduced the issue on our test bed.

When I run a single query to insert the data into table , then the query is getting hanged.

I debugged this issue by compiling the postgres code and using the database where the customer is facing problem. (see the gdb below) Then the coredump is coming with the assertion getting failed for transaction id as 0.

Following are my observations.

1. Even if I run a single query, (i.e no other transaction is in place) , then also following code is returning valid value of xwait. I checked this using gdb.(see the gdb below)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
File nbtinsert.c

Bool _bt_doinsert(Relation rel, IndexTuple itup,IndexUniqueCheck checkUnique, Relation heapRel) {
-----------------------------
-----------------------------
xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey checkUnique, &is_unique);

/* As per my understanding if I am running a single transaction then tis value should retun 0, but it seems that this value is being set some where In postgres code.
-----------------------------------
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
now because of the valid value of this xwait, it assumes that there is some other transaction which is already going on and so the current transaction has to wait till the old transaction is either committed or rolled backed.
So the current transaction waits in infinite loop for the completion of the other transaction( which is not there actually ).
So the current transaction just keeps on waiting forever.

2. I checked inside bt_check_unique and following is the code which is hit in our case and which suggests that there is an already existing transaction XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

/* If this tuple is being updated by other transaction then we have to wait for its commit/abort.*/ xwait = (TransactionIdIsValid(SnapshotDirty.xmin)) ?SnapshotDirty.xmin : SnapshotDirty.xmax;

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

3. This issue is coming only with that table where table index is defined.

4. Issue is resolved if the index is dropped and we recreate the table.

It looks to me that postgres stores the transaction (SnapshotDirty ?) somewhere and even if there is no other transaction going on this is Showing some abrupt value which is giving conflicts.

If anyone can help me with this piece of postgresql code then it will be really helpful.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
(gdb) where
#0 0x00000031c8232a45 in raise () from /lib64/libc.so.6
#1 0x00000031c8234225 in abort () from /lib64/libc.so.6
#2 0x000000000072248d in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>)
at assert.c:57
#3 0x000000000064ee48 in XactLockTableWait (xid=0) at lmgr.c:471
#4 0x00000000004837fb in _bt_doinsert (rel=0x7f616e9e3dc8, itup=0x221ab30, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7f616e9df3a8) at nbtinsert.c:169
#5 0x0000000000487f21 in btinsert (fcinfo=<value optimized out>) at nbtree.c:261
#6 0x0000000000728c14 in FunctionCall6Coll (flinfo=<value optimized out>, collation=<value optimized out>, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>,
arg4=<value optimized out>, arg5=140056444400552, arg6=1) at fmgr.c:1439
#7 0x000000000047f746 in index_insert (indexRelation=0x7f616e9e3dc8, values=0x7fff36fd71f0, isnull=0x7fff36fd72f0 "", heap_t_ctid=0x221a9dc, heapRelation=0x7f616e9df3a8,
checkUnique=UNIQUE_CHECK_YES) at indexam.c:215
#8 0x0000000000591c15 in ExecInsertIndexTuples (slot=0x2217918, tupleid=0x221a9dc, estate=0x2215a18) at execUtils.c:1086
#9 0x000000000059fab5 in ExecInsert (node=0x2219a40) at nodeModifyTable.c:247
#10 ExecModifyTable (node=0x2219a40) at nodeModifyTable.c:847
#11 0x00000000005878d8 in ExecProcNode (node=0x2219a40) at execProcnode.c:371
#12 0x0000000000586526 in ExecutePlan (queryDesc=0x2215608, direction=<value optimized out>, count=0) at execMain.c:1440
#13 standard_ExecutorRun (queryDesc=0x2215608, direction=<value optimized out>, count=0) at execMain.c:314
#14 0x0000000000667abf in ProcessQuery (plan=0x21c5920,
sourceText=0x21c3a08 "INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'Start', 'bgw', '20140512122404', NULL, NULL, NULL, 539);", params=0x0, dest=<value optimized out>, completionTag=0x7fff36fd76d0 "") at pquery.c:187
#15 0x0000000000667d65 in PortalRunMulti (portal=0x21272e8, isTopLevel=1 '\001', dest=0x220f580, altdest=0x220f580, completionTag=0x7fff36fd76d0 "") at pquery.c:1276
#16 0x0000000000668592 in PortalRun (portal=0x21272e8, count=9223372036854775807, isTopLevel=1 '\001', dest=0x220f580, altdest=0x220f580, completionTag=0x7fff36fd76d0 "") at pquery.c:813
#17 0x0000000000664949 in exec_simple_query (
query_string=0x21c3a08 "INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'Start', 'bgw', '20140512122404', NULL, NULL, NULL, 539);") at postgres.c:1020
#18 0x0000000000665880 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3968
#19 0x000000000061c691 in BackendRun () at postmaster.c:3617
#20 BackendStartup () at postmaster.c:3302
#21 ServerLoop () at postmaster.c:1466
#22 0x000000000061eff2 in PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1127
#23 0x00000000005b8bd0 in main (argc=3, argv=0x2118e70) at main.c:199

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, July 08, 2014 12:05 AM
To: Mitu Verma
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #10888: application is getting hanged in the poll() function of libpq.so.

Mitu Verma <mitu(dot)verma(at)ericsson(dot)com> writes:
> I am not sure if it is a postgres issue or not, so need your help in that.

Well, it looks like you've got two concurrent transactions trying to insert the exact same data:

> 16384 | fm_db_CollectCCN | 20714 | 16391 | mmsuper | | ::1 | | 19540
> | 2014-07-07 19:10:09.295565+02 | 2014-07-07 19:10:09.311701+02 | 2014-07-07 19:10:09.311701+02 | f | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539 )
> 16384 | fm_db_CollectCCN | 20819 | 16391 | mmsuper | | ::1 | | 19602
> | 2014-07-07 19:10:45.18688+02 | 2014-07-07 19:10:45.202903+02 | 2014-07-07 19:10:45.202903+02 | f | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539 )

Presumably this is falling foul of a unique index on the table.
The second one is going to block until the first one either commits or rolls back.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2014-07-11 13:34:39 Re: Missing file versions for a bunch of dll/exe files in Windows builds
Previous Message gerke.ephorus 2014-07-11 07:31:49 BUG #10925: Documentation not clear on copy options containing '_'