Re: User concurrency thresholding: where do I look?

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: User concurrency thresholding: where do I look?
Date: 2007-07-26 19:44:42
Message-ID: 46A8F9AA.4040507@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

BEAUTIFUL!!!

Using the Patch I can now go upto 1300 users without dropping.. But now
it still repeats at 1300-1350 users..

I corrected the Lock Descriptions based on what I got from lwlock.h and
retried the whole scalability again with profiling again.. This time it
looks like the ProcArrayLock

bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
XidGenLock Exclusive 1
CLogControlLock Shared 2
XidGenLock Shared 2
WALWriteLock Exclusive 4
WALInsertLock Exclusive 8
CLogControlLock Exclusive 9
ProcArrayLock Exclusive 9

Lock Id Combined Time (ns)
WALWriteLock 2842300
XidGenLock 4951000
CLogControlLock 11151800
WALInsertLock 13035600
ProcArrayLock 20040000

bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 8
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 22

Lock Id Combined Time (ns)
XidGenLock 4093300
WALWriteLock 4914800
WALInsertLock 7389100
ProcArrayLock 10248200
CLogControlLock 11989400

bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
WALWriteLock Exclusive 1
XidGenLock Shared 2
CLogControlLock Shared 3
CLogControlLock Exclusive 7
WALInsertLock Exclusive 7
ProcArrayLock Exclusive 31

Lock Id Combined Time (ns)
WALWriteLock 319100
XidGenLock 5388700
WALInsertLock 9901400
CLogControlLock 13465000
ProcArrayLock 42979700

bash-3.00# echo 900 users
900 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
CLogControlLock Shared 1
XidGenLock Exclusive 1
WALWriteLock Exclusive 2
CLogControlLock Exclusive 6
WALInsertLock Exclusive 9
ProcArrayLock Exclusive 25

Lock Id Combined Time (ns)
XidGenLock 3197700
WALWriteLock 3887100
CLogControlLock 15774500
WALInsertLock 38268700
ProcArrayLock 162073100

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Shared 1
ProcArrayLock Exclusive 67

Lock Id Combined Time (ns)
ProcArrayLock 216773800

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalStart+0x150
postgres`exec_bind_message+0x81c
postgres`PostgresMain+0x12b8
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
2779000

postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
213994800

bash-3.00# echo 1000 users - SLIGHT DROP
1000 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
WALWriteLock Exclusive 1
CLogControlLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Shared 3
WALInsertLock Exclusive 4
ProcArrayLock Exclusive 26

Lock Id Combined Time (ns)
WALWriteLock 1807400
XidGenLock 2024000
WALInsertLock 2177500
CLogControlLock 9064700
ProcArrayLock 199216000

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Shared 3
ProcArrayLock Exclusive 38

Lock Id Combined Time (ns)
ProcArrayLock 858238600

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1c8
postgres`TransactionIdIsInProgress+0x50
postgres`HeapTupleSatisfiesVacuum+0x2ec
postgres`_bt_check_unique+0x2a0
postgres`_bt_doinsert+0x98
postgres`btinsert+0x54
postgres`FunctionCall6+0x44
postgres`index_insert+0x90
postgres`ExecInsertIndexTuples+0x1bc
postgres`ExecUpdate+0x500
postgres`ExecutePlan+0x704
postgres`ExecutorRun+0x60
postgres`PortalRunMulti+0x2a0
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
167100

postgres`LWLockAcquire+0x1c8
postgres`GetSnapshotData+0x118
postgres`GetTransactionSnapshot+0x5c
postgres`PortalRunMulti+0x22c
postgres`PortalRun+0x310
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
7125900

postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
850945600

bash-3.00# echo 1100 users - DROP ....
1100 users
bash-3.00# ./4_lwlock_waits.d 7056

Lock Id Mode Count
CLogControlLock Shared 1
WALWriteLock Exclusive 1
XidGenLock Exclusive 1
ProcArrayLock Shared 2
WALInsertLock Exclusive 2
XidGenLock Shared 2
CLogControlLock Exclusive 4
ProcArrayLock Exclusive 20

Lock Id Combined Time (ns)
WALWriteLock 4179500
XidGenLock 6249400
CLogControlLock 20411100
WALInsertLock 29707600
ProcArrayLock 207923700

bash-3.00# ./6_lwlock_stack.d 4 7056

Lock Id Mode Count
ProcArrayLock Exclusive 40

Lock Id Combined Time (ns)
ProcArrayLock 692242100

Lock Id Combined Time (ns)

postgres`LWLockAcquire+0x1c8
postgres`CommitTransaction+0xe0
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
692242100

bash-3.00#

LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to
indicate it is lwlock.h
Any tweaks for that?

-Jignesh

Simon Riggs wrote:
> On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
>
>
>> However at 900 Users where the big drop in throughput occurs:
>> It gives a different top "consumer" of time:
>>
>
>
> postgres`LWLockAcquire+0x1c8
>
>> postgres`SimpleLruReadPage+0x1ac
>> postgres`TransactionIdGetStatus+0x14
>> postgres`TransactionLogFetch+0x58
>>
>
> TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
> Presumably the compiler has been rearranging things??
>
> Looks like you're out of clog buffers. It seems like the clog buffers
> aren't big enough to hold clog pages for long enough and the SELECT FOR
> SHARE processing is leaving lots of additional read locks that are
> increasing the number of clog requests for older xids.
>
> Try the enclosed patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/include/access/clog.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
> retrieving revision 1.19
> diff -c -r1.19 clog.h
> *** src/include/access/clog.h 5 Jan 2007 22:19:50 -0000 1.19
> --- src/include/access/clog.h 26 Jul 2007 15:44:58 -0000
> ***************
> *** 29,35 ****
>
>
> /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS 8
>
>
> extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
> --- 29,35 ----
>
>
> /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS 64
>
>
> extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Lor 2007-07-26 19:56:08 Re: User concurrency thresholding: where do I look?
Previous Message Chris Browne 2007-07-26 18:28:12 Re: performance of postgresql in replication using slony