Re: [HACKERS] Postgres Speed or lack thereof

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: pgsql-hackers(at)postgreSQL(dot)org, jholland(at)isr(dot)umd(dot)edu
Subject: Re: [HACKERS] Postgres Speed or lack thereof
Date: 1999-03-15 14:28:15
Message-ID: 199903151428.JAA12005@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, I assume this has all been corrected, right?

> I wrote:
> > As for where the time is going: "top" showed that the time for the
> > INSERT ops was almost all going into backend CPU time. My guess is
> > that most of the cost is in parsing and planning the INSERT
> > statements.
>
> Having an idle hour this evening, I thought it'd be interesting to build
> a backend with profiling enabled, so as to confirm or deny the above
> guess. It seems that indeed a lot of time is being wasted, but where
> it's being wasted might surprise you!
>
> I profiled a backend built from this morning's current CVS sources,
> being driven by a single psql run reading a series of exactly 5000
> INSERT commands. The first few are:
>
> INSERT INTO "fehistory_1" values (64,14,252,'D','-','-','s',1,4027,NULL,'S',1,4088,NULL,32,'Thu Jan 07 15:15:00 1999 EST','Fri Jan 08 08:30:38 1999 EST',4027,32,NULL,NULL,46,NULL);
> INSERT INTO "fehistory_1" values (65,14,252,'P','-','-','S',1,4086,NULL,NULL,NULL,NULL,NULL,32,'Fri Jan 08 16:00:00 1999 EST',NULL,NULL,NULL,NULL,NULL,45,NULL);
> INSERT INTO "fehistory_1" values (66,14,135,'P','-','-','S',1,13619,NULL,'s',1,12967,NULL,100,'Fri Jan 08 02:00:00 1999 EST',NULL,NULL,NULL,NULL,NULL,44,NULL);
>
> and it goes on for a long while in that vein :-). Total runtime was
> 278.09 CPU seconds in the backend, 307 seconds elapsed realtime. The
> table being inserted into has no indexes and was initially empty.
>
> The profiler is GNU gprof, which provides both flat and structured
> profile data. The top few entries in the flat profile are:
>
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 34.96 97.21 97.21 _mcount
> 7.18 117.19 19.98 8047816 0.00 0.00 tree_cut
> 6.56 135.43 18.24 962187 0.02 0.02 recv
> 3.55 145.31 9.88 8234278 0.00 0.00 tree_insert
> 3.49 155.02 9.71 4292947 0.00 0.01 malloc
> 3.31 164.22 9.20 1965056 0.00 0.01 SearchSysCache
> 3.24 173.23 9.01 4291775 0.00 0.01 free
> 2.44 180.01 6.78 6171667 0.00 0.00 tree_concatenate
> 2.17 186.04 6.03 8234069 0.00 0.00 tree_delete
> 1.07 189.02 2.98 10250 0.29 0.29 write
> 1.05 191.94 2.92 _psort_cmp
> 0.95 194.57 2.63 2983301 0.00 0.00 newNode
> 0.92 197.14 2.57 4251762 0.00 0.00 OrderedElemPop
> 0.88 199.60 2.46 2746172 0.00 0.00 comphash
> 0.82 201.88 2.28 4252495 0.00 0.01 AllocSetAlloc
> 0.77 204.02 2.14 1965056 0.00 0.00 SearchSysCacheTuple
> 0.73 206.05 2.03 265000 0.01 0.01 yylex
> 0.69 207.96 1.91 $$remU
> 0.68 209.86 1.90 4252500 0.00 0.00 OrderedElemPushHead
> 0.62 211.58 1.72 4247473 0.00 0.00 palloc
> 0.60 213.26 1.68 25004 0.07 0.07 send
> 0.56 214.81 1.55 1965056 0.00 0.00 CatalogCacheComputeHashIndex
> 0.55 216.33 1.52 5000 0.30 1.07 yyparse
> 0.54 217.82 1.49 4251762 0.00 0.01 AllocSetFree
> 0.53 219.29 1.47 4252494 0.00 0.00 MemoryContextAlloc
> 0.51 220.71 1.42 4252500 0.00 0.00 OrderedElemPushInto
> 0.49 222.07 1.36 3430844 0.00 0.00 strncmp
> 0.48 223.41 1.34 4205327 0.00 0.00 OrderedSetGetHead
> 0.47 224.73 1.32 elog
> 0.45 225.99 1.26 1964976 0.00 0.00 DLRemove
> 0.43 227.19 1.20 2510857 0.00 0.00 strcmp
> 0.43 228.38 1.19 4200327 0.00 0.00 OrderedElemGetBase
> 0.42 229.56 1.18 4245740 0.00 0.01 PortalHeapMemoryAlloc
> 0.38 230.62 1.06 4252500 0.00 0.00 OrderedElemPush
> 0.37 231.66 1.04 4205327 0.00 0.00 AllocSetGetFirst
> 0.36 232.66 1.00 1965145 0.00 0.00 DLAddHead
>
> _mcount is part of the profiling apparatus and can be ignored (although
> the fact that it's so high up implies that we're executing an awful lot
> of short routine calls, because mcount runs once per entry and exit of
> profilable routines). tree_cut, tree_insert, tree_concatenate, and
> tree_delete are subroutines of malloc/free. Altogether, malloc/free
> and friends accounted for 61.39 seconds out of the 180.88 non-overhead
> CPU seconds in this run.
>
> In other words, we're spending a third of our time mallocing and freeing
> memory. A tad high, what?
>
> Actually, it's worse than that, because AllocSetAlloc,
> PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
> routines represent our own bookkeeping layer atop malloc/free.
> That's another 18.66 seconds spent in these top routines, which means
> that we are real close to expending half the backend's runtime on
> memory bookkeeping. This needs work.
>
> The other thing that jumps out here is the unreasonably high position of
> recv(), which is called 962187 times. The script being read by psql was
> only 957186 characters. Evidently we're invoking a kernel recv() call
> once per character read from the frontend. I suspect this is an
> inefficiency introduced by Magnus Hagander's recent rewrite of backend
> libpq (see, I told you there was a reason for using stdio ;-)). We're
> gonna have to do something about that, though it's not as critical as
> the memory-allocation issue. It also appears that send() is now being
> invoked multiple times per backend reply, which is going to create
> inefficiencies outside the backend (ie, multiple packets per reply).
> On a test case with a lot of SELECTs that would show up more than it
> does here.
>
>
> Moving on to the dynamic profile, the most interesting items are:
>
> index % time self children called name
> -----------------------------------------------
> 0.05 118.06 1/1 DoBackend [7]
> [8] 65.3 0.05 118.06 1 PostgresMain [8]
> 0.04 57.64 5000/5000 pg_exec_query [9]
> 0.01 39.83 5000/5000 CommitTransactionCommand [11]
> 0.02 18.86 5001/5001 ReadCommand [27]
> 0.01 1.18 5000/5000 StartTransactionCommand [118]
> 0.01 0.36 5001/5001 ReadyForQuery [190]
> 0.04 0.00 5001/5001 EnableNotifyInterrupt [385]
> 0.04 0.00 5000/5000 strspn [388]
> 0.00 0.01 1/1 InitPostgres [495]
> 0.00 0.00 5000/548573 strlen [172]
> 0.00 0.00 5001/93349 memset [366]
> 0.00 0.00 1/1 read_pg_options [601]
> 0.00 0.00 2/2 pq_putint [637]
> 0.00 0.00 1/15002 pq_putnchar [123]
> 0.00 0.00 2/10013 getenv [241]
> 0.00 0.00 1/1 SetPgUserName [683]
> 0.00 0.00 1/1 pq_init [687]
> 0.00 0.00 1/1 proc_exit [704]
> 0.00 0.00 1/1 pq_close [721]
> 0.00 0.00 1/5002 getpgusername [521]
> 0.00 0.00 1/2 FindExec [722]
> 0.00 0.00 5/8 getopt [728]
> 0.00 0.00 5001/5001 DisableNotifyInterrupt [761]
> 0.00 0.00 9/34 pqsignal [806]
> 0.00 0.00 2/5 atoi [837]
> 0.00 0.00 1/2 SetProcessingMode [868]
> 0.00 0.00 1/10002 TransactionFlushEnabled [749]
> 0.00 0.00 1/1 sigsetjmp [934]
>
> ("self" is the amount of time (CPU sec) spent directly in the named function,
> while "children" is the amount of time spent in its callees, pro-rated
> by the number of calls. For example, strlen was actually measured to
> use 0.47 sec in the whole run, but since PostgresMain called it less
> than 1% of all the times it was called, PostgresMain gets blamed for
> less than 0.01 sec of that total.)
>
> The interesting thing here is the relative times indicated for
> pg_exec_query, CommitTransactionCommand, and ReadCommand. We already
> found out why ReadCommand is so slow (recv() per character) ... but why
> is CommitTransactionCommand so high, when I already discovered that
> doing only one commit doesn't help much? Patience, we'll get there.
>
> -----------------------------------------------
> 0.03 57.61 5000/5000 pg_exec_query [9]
> [10] 31.9 0.03 57.61 5000 pg_exec_query_dest [10]
> 0.09 39.12 5000/5000 pg_parse_and_plan [13]
> 0.00 17.37 5000/5000 ProcessQuery [36]
> 0.02 0.60 5000/5000 SetQuerySnapshot [159]
> 0.01 0.34 5000/5000 CommandCounterIncrement [196]
> 0.02 0.05 10000/4291775 free [22]
> -----------------------------------------------
> 0.09 39.12 5000/5000 pg_exec_query_dest [10]
> [13] 21.7 0.09 39.12 5000 pg_parse_and_plan [13]
> 0.02 19.28 5000/5000 parser [26]
> 0.03 17.63 5000/5000 planner [34]
> 0.00 1.98 5000/5000 QueryRewrite [90]
> 0.02 0.05 10000/4292947 malloc [21]
> 0.02 0.05 10000/4291775 free [22]
> 0.00 0.02 5000/497779 lappend [91]
> 0.01 0.00 5000/5000 IsAbortedTransactionBlockState [529]
> 0.00 0.00 5000/72779 length [342]
>
> In other words, parsing, planning, and executing an INSERT each take
> about the same amount of time.
>
> -----------------------------------------------
> 0.01 39.83 5000/5000 PostgresMain [8]
> [11] 22.0 0.01 39.83 5000 CommitTransactionCommand [11]
> 0.02 39.81 5000/5000 CommitTransaction [12]
> -----------------------------------------------
> 0.02 39.81 5000/5000 CommitTransactionCommand [11]
> [12] 22.0 0.02 39.81 5000 CommitTransaction [12]
> 0.00 38.12 5000/5000 AtCommit_Memory [16]
> 0.01 1.10 5000/5000 RecordTransactionCommit [122]
> 0.00 0.30 5000/5000 AtCommit_Locks [211]
> 0.02 0.10 5000/5000 AtEOXact_portals [286]
> 0.03 0.07 5000/5000 DestroyTempRels [305]
> 0.03 0.00 5000/5000 _lo_commit [425]
> 0.01 0.00 5000/5000 AtCommit_Notify [522]
> 0.01 0.00 5000/5000 CloseSequences [523]
> 0.01 0.00 5000/5000 RelationPurgeLocalRelation [531]
> 0.00 0.00 5000/10000 AtCommit_Cache [750]
> -----------------------------------------------
> 0.00 38.12 5000/5000 CommitTransaction [12]
> [16] 21.1 0.00 38.12 5000 AtCommit_Memory [16]
> 0.01 38.11 5000/5000 EndPortalAllocMode [17]
> 0.00 0.00 5000/10552 MemoryContextSwitchTo [747]
> -----------------------------------------------
> 0.01 38.11 5000/5000 AtCommit_Memory [16]
> [17] 21.1 0.01 38.11 5000 EndPortalAllocMode [17]
> 0.54 37.55 5000/5000 AllocSetReset [18]
> 0.01 0.00 5000/10000 PortalHeapMemoryGetVariableMemory [449]
> 0.01 0.00 5000/5000 FixedStackPop [528]
> 0.00 0.00 5000/51434 MemoryContextFree [367]
> -----------------------------------------------
> 0.54 37.55 5000/5000 EndPortalAllocMode [17]
> [18] 21.1 0.54 37.55 5000 AllocSetReset [18]
> 1.47 32.51 4200327/4251762 AllocSetFree [20]
> 1.04 2.53 4205327/4205327 AllocSetGetFirst [64]
>
> In other words, essentially *all* of the CPU time spent in
> CommitTransaction is spent freeing memory. That's probably why
> ganging the transactions doesn't help --- it's the same number of
> memory blocks getting allocated and freed.
>
> 0.02 19.28 5000/5000 pg_parse_and_plan [13]
> [26] 10.7 0.02 19.28 5000 parser [26]
> 0.02 13.88 5000/5000 parse_analyze [41]
> 1.52 3.81 5000/5000 yyparse [54]
> 0.01 0.01 5000/5000 init_io [474]
> 0.00 0.01 5000/420496 pstrdup [124]
> 0.01 0.00 5000/5000 parser_init [533]
>
> Thomas might know why parse_analyze is taking so much time compared to
> the rest of the parsing machinery...
>
>
> I won't bother the list with the rest of the profile, although I'll be
> happy to send it to anyone who wants to see all of it. Our immediate
> performance problems seem pretty clear, however: time to rethink memory
> management.
>
> regards, tom lane
>
>

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-03-15 14:31:26 Re: [HACKERS] SPI_prepare() doesn't work well ?
Previous Message Bruce Momjian 1999-03-15 14:25:27 Re: [HACKERS] pg_dump again