Re: can we optimize STACK_DEPTH_SLOP

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we optimize STACK_DEPTH_SLOP
Date: 2016-07-10 23:34:15
Message-ID: 6102.1468193655@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So, agreed, let's commit some temporary debug code and see what the
> buildfarm can teach us. Will go work on that in a bit.

After reviewing the buildfarm results, I'm feeling nervous about this
whole idea again. For the most part, the unaccounted-for daylight between
the maximum stack depth measured by check_stack_depth and the actually
dirtied stack space reported by pmap is under 100K. But there are a
pretty fair number of exceptions. The worst cases I found were on
"dunlin", which approached 200K extra space in a couple of places:

dunlin | 2016-07-09 22:05:09 | check.log | 00007ffff2667000 268 208 208 rw--- [ stack ]
dunlin | 2016-07-09 22:05:09 | check.log | max measured stack depth 14kB
dunlin | 2016-07-09 22:05:09 | install-check-C.log | 00007fffee650000 268 200 200 rw--- [ stack ]
dunlin | 2016-07-09 22:05:09 | install-check-C.log | max measured stack depth 14kB

This appears to be happening in the tsdicts test script. Other machines
also show a significant discrepancy between pmap and check_stack_depth
results for that test, which suggests that maybe the tsearch code is being
overly reliant on large local variables. But I haven't dug through it.

Another area of concern is PLs. For instance, on capybara, a machine
otherwise pretty unexceptional in stack-space appetite, quite a few of the
PL tests ate ~100K of unaccounted-for space:

capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 8kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 136 136 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 0kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 5kB
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 116 116 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 7kB

Presumably that reflects some oddity of the local version of perl or
python, but I have no idea what.

So while we could possibly get away with reducing STACK_DEPTH_SLOP
to 256K, there is good reason to think that that would be leaving
little or no safety margin.

At this point I'm inclined to think we should leave well enough alone.
At the very least, if we were to try to reduce that number, I'd want
to have some plan for tracking our stack space consumption better than
we have done in the past.

regards, tom lane

PS: for amusement's sake, here are some numbers I extracted concerning
the relative stack-hungriness of different buildfarm members. First,
the number of recursion levels each machine could accomplish before
hitting "stack too deep" in the errors.sql regression test (measured by
counting the number of CONTEXT lines in the relevant error message):

sysname | snapshot | count
---------------+---------------------+-------
protosciurus | 2016-07-10 12:03:06 | 731
chub | 2016-07-10 15:10:01 | 1033
quokka | 2016-07-10 02:17:31 | 1033
hornet | 2016-07-09 23:42:32 | 1156
clam | 2016-07-09 22:00:01 | 1265
anole | 2016-07-09 22:41:40 | 1413
spoonbill | 2016-07-09 23:00:05 | 1535
sungazer | 2016-07-09 23:51:33 | 1618
gaur | 2016-07-09 04:53:13 | 1634
kouprey | 2016-07-10 04:58:00 | 1653
nudibranch | 2016-07-10 09:18:10 | 1664
grouse | 2016-07-10 08:43:02 | 1708
sprat | 2016-07-10 08:43:55 | 1717
pademelon | 2016-07-09 06:12:10 | 1814
mandrill | 2016-07-10 00:10:02 | 2093
gharial | 2016-07-10 01:15:50 | 2248
francolin | 2016-07-10 13:00:01 | 2379
piculet | 2016-07-10 13:00:01 | 2379
lorikeet | 2016-07-10 08:04:19 | 2422
caecilian | 2016-07-09 19:31:50 | 2423
jacana | 2016-07-09 22:36:38 | 2515
bowerbird | 2016-07-10 02:13:47 | 2617
locust | 2016-07-09 21:50:26 | 2838
prairiedog | 2016-07-09 22:44:58 | 2838
dromedary | 2016-07-09 20:48:06 | 2840
damselfly | 2016-07-10 10:27:09 | 2880
curculio | 2016-07-09 21:30:01 | 2905
mylodon | 2016-07-09 20:50:01 | 2974
tern | 2016-07-09 23:51:23 | 3015
burbot | 2016-07-10 03:30:45 | 3042
magpie | 2016-07-09 21:38:02 | 3043
reindeer | 2016-07-10 04:00:05 | 3043
friarbird | 2016-07-10 04:20:01 | 3187
nightjar | 2016-07-09 21:17:52 | 3187
sittella | 2016-07-09 21:46:29 | 3188
crake | 2016-07-09 22:06:09 | 3267
guaibasaurus | 2016-07-10 00:17:01 | 3267
ibex | 2016-07-09 20:59:06 | 3267
mule | 2016-07-09 23:30:02 | 3267
spurfowl | 2016-07-09 21:06:39 | 3267
anchovy | 2016-07-09 21:41:04 | 3268
blesbok | 2016-07-09 21:17:46 | 3268
capybara | 2016-07-09 21:15:56 | 3268
conchuela | 2016-07-09 21:00:01 | 3268
handfish | 2016-07-09 04:37:57 | 3268
macaque | 2016-07-08 21:25:06 | 3268
minisauripus | 2016-07-10 03:19:42 | 3268
rhinoceros | 2016-07-09 21:45:01 | 3268
sidewinder | 2016-07-09 21:45:00 | 3272
jaguarundi | 2016-07-10 06:52:05 | 3355
loach | 2016-07-09 21:15:00 | 3355
okapi | 2016-07-10 06:15:02 | 3425
fulmar | 2016-07-09 23:47:57 | 3436
longfin | 2016-07-09 21:10:17 | 3444
brolga | 2016-07-10 09:40:46 | 3537
dunlin | 2016-07-09 22:05:09 | 3616
coypu | 2016-07-09 22:20:46 | 3626
hyrax | 2016-07-09 19:52:03 | 3635
treepie | 2016-07-09 22:41:37 | 3635
frogmouth | 2016-07-10 02:00:09 | 3636
narwhal | 2016-07-10 10:00:05 | 3966
rover_firefly | 2016-07-10 15:01:45 | 4084
lapwing | 2016-07-09 21:15:01 | 4085
cockatiel | 2016-07-10 13:40:47 | 4362
currawong | 2016-07-10 05:16:03 | 5136
mastodon | 2016-07-10 11:00:01 | 5136
termite | 2016-07-09 21:01:30 | 5452
hamster | 2016-07-09 16:00:06 | 5685
dangomushi | 2016-07-09 18:00:27 | 5692
gull | 2016-07-10 04:48:28 | 5692
mereswine | 2016-07-10 10:40:57 | 5810
axolotl | 2016-07-09 22:12:12 | 5811
chipmunk | 2016-07-10 08:18:07 | 5949
grison | 2016-07-09 21:00:02 | 5949
(74 rows)

(coypu gets a gold star for this one, since it makes a good showing
despite having max_stack_depth set to 1536kB --- everyone else seems
to be using 2MB.)

Second, the stack space consumed for the regex regression test --- here,
smaller is better:

currawong | 2016-07-10 05:16:03 | max measured stack depth 213kB
mastodon | 2016-07-10 11:00:01 | max measured stack depth 213kB
axolotl | 2016-07-09 22:12:12 | max measured stack depth 240kB
hamster | 2016-07-09 16:00:06 | max measured stack depth 240kB
mereswine | 2016-07-10 10:40:57 | max measured stack depth 240kB
brolga | 2016-07-10 09:40:46 | max measured stack depth 284kB
narwhal | 2016-07-10 10:00:05 | max measured stack depth 284kB
cockatiel | 2016-07-10 13:40:47 | max measured stack depth 285kB
francolin | 2016-07-10 13:00:01 | max measured stack depth 285kB
hyrax | 2016-07-09 19:52:03 | max measured stack depth 285kB
magpie | 2016-07-09 21:38:02 | max measured stack depth 285kB
piculet | 2016-07-10 13:00:01 | max measured stack depth 285kB
reindeer | 2016-07-10 04:00:05 | max measured stack depth 285kB
treepie | 2016-07-09 22:41:37 | max measured stack depth 285kB
lapwing | 2016-07-09 21:15:01 | max measured stack depth 287kB
rover_firefly | 2016-07-10 15:01:45 | max measured stack depth 287kB
coypu | 2016-07-09 22:20:46 | max measured stack depth 288kB
friarbird | 2016-07-10 04:20:01 | max measured stack depth 289kB
nightjar | 2016-07-09 21:17:52 | max measured stack depth 289kB
gharial | 2016-07-10 01:15:50 | max measured stack depths 290kB, 384kB
bowerbird | 2016-07-10 02:13:47 | max measured stack depth 378kB
caecilian | 2016-07-09 19:31:50 | max measured stack depth 378kB
frogmouth | 2016-07-10 02:00:09 | max measured stack depth 378kB
mylodon | 2016-07-09 20:50:01 | max measured stack depth 378kB
jaguarundi | 2016-07-10 06:52:05 | max measured stack depth 379kB
loach | 2016-07-09 21:15:00 | max measured stack depth 379kB
longfin | 2016-07-09 21:10:17 | max measured stack depth 379kB
sidewinder | 2016-07-09 21:45:00 | max measured stack depth 379kB
anchovy | 2016-07-09 21:41:04 | max measured stack depth 381kB
blesbok | 2016-07-09 21:17:46 | max measured stack depth 381kB
capybara | 2016-07-09 21:15:56 | max measured stack depth 381kB
conchuela | 2016-07-09 21:00:01 | max measured stack depth 381kB
crake | 2016-07-09 22:06:09 | max measured stack depth 381kB
curculio | 2016-07-09 21:30:01 | max measured stack depth 381kB
guaibasaurus | 2016-07-10 00:17:01 | max measured stack depth 381kB
handfish | 2016-07-09 04:37:57 | max measured stack depth 381kB
ibex | 2016-07-09 20:59:06 | max measured stack depth 381kB
macaque | 2016-07-08 21:25:06 | max measured stack depth 381kB
minisauripus | 2016-07-10 03:19:42 | max measured stack depth 381kB
mule | 2016-07-09 23:30:02 | max measured stack depth 381kB
rhinoceros | 2016-07-09 21:45:01 | max measured stack depth 381kB
sittella | 2016-07-09 21:46:29 | max measured stack depth 381kB
spurfowl | 2016-07-09 21:06:39 | max measured stack depth 381kB
dromedary | 2016-07-09 20:48:06 | max measured stack depth 382kB
pademelon | 2016-07-09 06:12:10 | max measured stack depth 382kB
fulmar | 2016-07-09 23:47:57 | max measured stack depth 383kB
dunlin | 2016-07-09 22:05:09 | max measured stack depth 388kB
okapi | 2016-07-10 06:15:02 | max measured stack depth 389kB
mandrill | 2016-07-10 00:10:02 | max measured stack depth 489kB
tern | 2016-07-09 23:51:23 | max measured stack depth 491kB
damselfly | 2016-07-10 10:27:09 | max measured stack depth 492kB
burbot | 2016-07-10 03:30:45 | max measured stack depth 567kB
locust | 2016-07-09 21:50:26 | max measured stack depth 571kB
prairiedog | 2016-07-09 22:44:58 | max measured stack depth 571kB
clam | 2016-07-09 22:00:01 | max measured stack depth 573kB
jacana | 2016-07-09 22:36:38 | max measured stack depth 661kB
lorikeet | 2016-07-10 08:04:19 | max measured stack depth 662kB
gaur | 2016-07-09 04:53:13 | max measured stack depth 756kB
chub | 2016-07-10 15:10:01 | max measured stack depth 856kB
quokka | 2016-07-10 02:17:31 | max measured stack depth 856kB
hornet | 2016-07-09 23:42:32 | max measured stack depth 868kB
grouse | 2016-07-10 08:43:02 | max measured stack depth 944kB
kouprey | 2016-07-10 04:58:00 | max measured stack depth 944kB
nudibranch | 2016-07-10 09:18:10 | max measured stack depth 945kB
sprat | 2016-07-10 08:43:55 | max measured stack depth 946kB
sungazer | 2016-07-09 23:51:33 | max measured stack depth 963kB
protosciurus | 2016-07-10 12:03:06 | max measured stack depth 1432kB

The second list omits a couple of machines whose reports got garbled
by concurrent insertions into the log file.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-07-11 02:05:47 Re: pg_hba_lookup function to get all matching pg_hba.conf entries
Previous Message Julien Rouhaud 2016-07-10 18:36:51 Re: Issue with bgworker, SPI and pgstat_report_stat