Re: [HACKERS] Postgres Speed or lack thereof

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: John Holland <jholland(at)isr(dot)umd(dot)edu>
Subject: Re: [HACKERS] Postgres Speed or lack thereof
Date: 1999-01-18 05:31:43
Message-ID: 27367.916637503@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> As for where the time is going: "top" showed that the time for the
> INSERT ops was almost all going into backend CPU time. My guess is
> that most of the cost is in parsing and planning the INSERT
> statements.

Having an idle hour this evening, I thought it'd be interesting to build
a backend with profiling enabled, so as to confirm or deny the above
guess. It seems that indeed a lot of time is being wasted, but where
it's being wasted might surprise you!

I profiled a backend built from this morning's current CVS sources,
being driven by a single psql run reading a series of exactly 5000
INSERT commands. The first few are:

INSERT INTO "fehistory_1" values (64,14,252,'D','-','-','s',1,4027,NULL,'S',1,4088,NULL,32,'Thu Jan 07 15:15:00 1999 EST','Fri Jan 08 08:30:38 1999 EST',4027,32,NULL,NULL,46,NULL);
INSERT INTO "fehistory_1" values (65,14,252,'P','-','-','S',1,4086,NULL,NULL,NULL,NULL,NULL,32,'Fri Jan 08 16:00:00 1999 EST',NULL,NULL,NULL,NULL,NULL,45,NULL);
INSERT INTO "fehistory_1" values (66,14,135,'P','-','-','S',1,13619,NULL,'s',1,12967,NULL,100,'Fri Jan 08 02:00:00 1999 EST',NULL,NULL,NULL,NULL,NULL,44,NULL);

and it goes on for a long while in that vein :-). Total runtime was
278.09 CPU seconds in the backend, 307 seconds elapsed realtime. The
table being inserted into has no indexes and was initially empty.

The profiler is GNU gprof, which provides both flat and structured
profile data. The top few entries in the flat profile are:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
34.96 97.21 97.21 _mcount
7.18 117.19 19.98 8047816 0.00 0.00 tree_cut
6.56 135.43 18.24 962187 0.02 0.02 recv
3.55 145.31 9.88 8234278 0.00 0.00 tree_insert
3.49 155.02 9.71 4292947 0.00 0.01 malloc
3.31 164.22 9.20 1965056 0.00 0.01 SearchSysCache
3.24 173.23 9.01 4291775 0.00 0.01 free
2.44 180.01 6.78 6171667 0.00 0.00 tree_concatenate
2.17 186.04 6.03 8234069 0.00 0.00 tree_delete
1.07 189.02 2.98 10250 0.29 0.29 write
1.05 191.94 2.92 _psort_cmp
0.95 194.57 2.63 2983301 0.00 0.00 newNode
0.92 197.14 2.57 4251762 0.00 0.00 OrderedElemPop
0.88 199.60 2.46 2746172 0.00 0.00 comphash
0.82 201.88 2.28 4252495 0.00 0.01 AllocSetAlloc
0.77 204.02 2.14 1965056 0.00 0.00 SearchSysCacheTuple
0.73 206.05 2.03 265000 0.01 0.01 yylex
0.69 207.96 1.91 $$remU
0.68 209.86 1.90 4252500 0.00 0.00 OrderedElemPushHead
0.62 211.58 1.72 4247473 0.00 0.00 palloc
0.60 213.26 1.68 25004 0.07 0.07 send
0.56 214.81 1.55 1965056 0.00 0.00 CatalogCacheComputeHashIndex
0.55 216.33 1.52 5000 0.30 1.07 yyparse
0.54 217.82 1.49 4251762 0.00 0.01 AllocSetFree
0.53 219.29 1.47 4252494 0.00 0.00 MemoryContextAlloc
0.51 220.71 1.42 4252500 0.00 0.00 OrderedElemPushInto
0.49 222.07 1.36 3430844 0.00 0.00 strncmp
0.48 223.41 1.34 4205327 0.00 0.00 OrderedSetGetHead
0.47 224.73 1.32 elog
0.45 225.99 1.26 1964976 0.00 0.00 DLRemove
0.43 227.19 1.20 2510857 0.00 0.00 strcmp
0.43 228.38 1.19 4200327 0.00 0.00 OrderedElemGetBase
0.42 229.56 1.18 4245740 0.00 0.01 PortalHeapMemoryAlloc
0.38 230.62 1.06 4252500 0.00 0.00 OrderedElemPush
0.37 231.66 1.04 4205327 0.00 0.00 AllocSetGetFirst
0.36 232.66 1.00 1965145 0.00 0.00 DLAddHead

_mcount is part of the profiling apparatus and can be ignored (although
the fact that it's so high up implies that we're executing an awful lot
of short routine calls, because mcount runs once per entry and exit of
profilable routines). tree_cut, tree_insert, tree_concatenate, and
tree_delete are subroutines of malloc/free. Altogether, malloc/free
and friends accounted for 61.39 seconds out of the 180.88 non-overhead
CPU seconds in this run.

In other words, we're spending a third of our time mallocing and freeing
memory. A tad high, what?

Actually, it's worse than that, because AllocSetAlloc,
PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
routines represent our own bookkeeping layer atop malloc/free.
That's another 18.66 seconds spent in these top routines, which means
that we are real close to expending half the backend's runtime on
memory bookkeeping. This needs work.

The other thing that jumps out here is the unreasonably high position of
recv(), which is called 962187 times. The script being read by psql was
only 957186 characters. Evidently we're invoking a kernel recv() call
once per character read from the frontend. I suspect this is an
inefficiency introduced by Magnus Hagander's recent rewrite of backend
libpq (see, I told you there was a reason for using stdio ;-)). We're
gonna have to do something about that, though it's not as critical as
the memory-allocation issue. It also appears that send() is now being
invoked multiple times per backend reply, which is going to create
inefficiencies outside the backend (ie, multiple packets per reply).
On a test case with a lot of SELECTs that would show up more than it
does here.

Moving on to the dynamic profile, the most interesting items are:

index % time self children called name
-----------------------------------------------
0.05 118.06 1/1 DoBackend [7]
[8] 65.3 0.05 118.06 1 PostgresMain [8]
0.04 57.64 5000/5000 pg_exec_query [9]
0.01 39.83 5000/5000 CommitTransactionCommand [11]
0.02 18.86 5001/5001 ReadCommand [27]
0.01 1.18 5000/5000 StartTransactionCommand [118]
0.01 0.36 5001/5001 ReadyForQuery [190]
0.04 0.00 5001/5001 EnableNotifyInterrupt [385]
0.04 0.00 5000/5000 strspn [388]
0.00 0.01 1/1 InitPostgres [495]
0.00 0.00 5000/548573 strlen [172]
0.00 0.00 5001/93349 memset [366]
0.00 0.00 1/1 read_pg_options [601]
0.00 0.00 2/2 pq_putint [637]
0.00 0.00 1/15002 pq_putnchar [123]
0.00 0.00 2/10013 getenv [241]
0.00 0.00 1/1 SetPgUserName [683]
0.00 0.00 1/1 pq_init [687]
0.00 0.00 1/1 proc_exit [704]
0.00 0.00 1/1 pq_close [721]
0.00 0.00 1/5002 getpgusername [521]
0.00 0.00 1/2 FindExec [722]
0.00 0.00 5/8 getopt [728]
0.00 0.00 5001/5001 DisableNotifyInterrupt [761]
0.00 0.00 9/34 pqsignal [806]
0.00 0.00 2/5 atoi [837]
0.00 0.00 1/2 SetProcessingMode [868]
0.00 0.00 1/10002 TransactionFlushEnabled [749]
0.00 0.00 1/1 sigsetjmp [934]

("self" is the amount of time (CPU sec) spent directly in the named function,
while "children" is the amount of time spent in its callees, pro-rated
by the number of calls. For example, strlen was actually measured to
use 0.47 sec in the whole run, but since PostgresMain called it less
than 1% of all the times it was called, PostgresMain gets blamed for
less than 0.01 sec of that total.)

The interesting thing here is the relative times indicated for
pg_exec_query, CommitTransactionCommand, and ReadCommand. We already
found out why ReadCommand is so slow (recv() per character) ... but why
is CommitTransactionCommand so high, when I already discovered that
doing only one commit doesn't help much? Patience, we'll get there.

-----------------------------------------------
0.03 57.61 5000/5000 pg_exec_query [9]
[10] 31.9 0.03 57.61 5000 pg_exec_query_dest [10]
0.09 39.12 5000/5000 pg_parse_and_plan [13]
0.00 17.37 5000/5000 ProcessQuery [36]
0.02 0.60 5000/5000 SetQuerySnapshot [159]
0.01 0.34 5000/5000 CommandCounterIncrement [196]
0.02 0.05 10000/4291775 free [22]
-----------------------------------------------
0.09 39.12 5000/5000 pg_exec_query_dest [10]
[13] 21.7 0.09 39.12 5000 pg_parse_and_plan [13]
0.02 19.28 5000/5000 parser [26]
0.03 17.63 5000/5000 planner [34]
0.00 1.98 5000/5000 QueryRewrite [90]
0.02 0.05 10000/4292947 malloc [21]
0.02 0.05 10000/4291775 free [22]
0.00 0.02 5000/497779 lappend [91]
0.01 0.00 5000/5000 IsAbortedTransactionBlockState [529]
0.00 0.00 5000/72779 length [342]

In other words, parsing, planning, and executing an INSERT each take
about the same amount of time.

-----------------------------------------------
0.01 39.83 5000/5000 PostgresMain [8]
[11] 22.0 0.01 39.83 5000 CommitTransactionCommand [11]
0.02 39.81 5000/5000 CommitTransaction [12]
-----------------------------------------------
0.02 39.81 5000/5000 CommitTransactionCommand [11]
[12] 22.0 0.02 39.81 5000 CommitTransaction [12]
0.00 38.12 5000/5000 AtCommit_Memory [16]
0.01 1.10 5000/5000 RecordTransactionCommit [122]
0.00 0.30 5000/5000 AtCommit_Locks [211]
0.02 0.10 5000/5000 AtEOXact_portals [286]
0.03 0.07 5000/5000 DestroyTempRels [305]
0.03 0.00 5000/5000 _lo_commit [425]
0.01 0.00 5000/5000 AtCommit_Notify [522]
0.01 0.00 5000/5000 CloseSequences [523]
0.01 0.00 5000/5000 RelationPurgeLocalRelation [531]
0.00 0.00 5000/10000 AtCommit_Cache [750]
-----------------------------------------------
0.00 38.12 5000/5000 CommitTransaction [12]
[16] 21.1 0.00 38.12 5000 AtCommit_Memory [16]
0.01 38.11 5000/5000 EndPortalAllocMode [17]
0.00 0.00 5000/10552 MemoryContextSwitchTo [747]
-----------------------------------------------
0.01 38.11 5000/5000 AtCommit_Memory [16]
[17] 21.1 0.01 38.11 5000 EndPortalAllocMode [17]
0.54 37.55 5000/5000 AllocSetReset [18]
0.01 0.00 5000/10000 PortalHeapMemoryGetVariableMemory [449]
0.01 0.00 5000/5000 FixedStackPop [528]
0.00 0.00 5000/51434 MemoryContextFree [367]
-----------------------------------------------
0.54 37.55 5000/5000 EndPortalAllocMode [17]
[18] 21.1 0.54 37.55 5000 AllocSetReset [18]
1.47 32.51 4200327/4251762 AllocSetFree [20]
1.04 2.53 4205327/4205327 AllocSetGetFirst [64]

In other words, essentially *all* of the CPU time spent in
CommitTransaction is spent freeing memory. That's probably why
ganging the transactions doesn't help --- it's the same number of
memory blocks getting allocated and freed.

0.02 19.28 5000/5000 pg_parse_and_plan [13]
[26] 10.7 0.02 19.28 5000 parser [26]
0.02 13.88 5000/5000 parse_analyze [41]
1.52 3.81 5000/5000 yyparse [54]
0.01 0.01 5000/5000 init_io [474]
0.00 0.01 5000/420496 pstrdup [124]
0.01 0.00 5000/5000 parser_init [533]

Thomas might know why parse_analyze is taking so much time compared to
the rest of the parsing machinery...

I won't bother the list with the rest of the profile, although I'll be
happy to send it to anyone who wants to see all of it. Our immediate
performance problems seem pretty clear, however: time to rethink memory
management.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vadim Mikheev 1999-01-18 05:41:11 Re: [HACKERS] Syntax errors in current tree
Previous Message Bruce Momjian 1999-01-18 05:28:20 I need a PostgreSQL vacation