Re: can we optimize STACK_DEPTH_SLOP

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we optimize STACK_DEPTH_SLOP
Date: 2016-07-07 16:55:11
Message-ID: 20233.1467910511@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found out that pmap can give much more fine-grained results than I was
getting before, if you give it the -x flag and then pay attention to the
"dirty" column rather than the "nominal size" column. That gives a
reliable indication of how much stack space the process ever actually
touched, with resolution apparently 4KB on my machine.

I redid my measurements with commit 62c8421e8 applied, and now get results
like this for one run of the standard regression tests:

$ grep '\[ stack \]' postmaster.log | sort -k 4n | uniq -c
137 00007fff0f615000 84 36 36 rw--- [ stack ]
21 00007fff0f615000 84 40 40 rw--- [ stack ]
4 00007fff0f615000 84 44 44 rw--- [ stack ]
20 00007fff0f615000 84 48 48 rw--- [ stack ]
8 00007fff0f615000 84 52 52 rw--- [ stack ]
2 00007fff0f615000 84 56 56 rw--- [ stack ]
10 00007fff0f615000 84 60 60 rw--- [ stack ]
3 00007fff0f615000 84 64 64 rw--- [ stack ]
3 00007fff0f615000 84 68 68 rw--- [ stack ]
2 00007fff0f615000 84 72 72 rw--- [ stack ]
1 00007fff0f612000 96 76 76 rw--- [ stack ]
2 00007fff0f60e000 112 112 112 rw--- [ stack ]
1 00007fff0f5e0000 296 296 296 rw--- [ stack ]
1 00007fff0f427000 2060 2060 2060 rw--- [ stack ]

The rightmost numeric column is the "dirty KB in region" column, and 36KB
is the floor established by the postmaster. (It looks like selecting
timezone is still the largest stack-space hog in that, but it's no longer
enough to make me want to do something about it.) So now we're seeing
some cases that exceed that floor, which is good. regex and errors are
still the outliers, as expected.

Also, I found that on OS X "vmmap -dirty" could produce results comparable
to pmap, so here's the numbers for the same test case on current OS X:

154 Stack 8192K 36K 2
5 Stack 8192K 40K 2
11 Stack 8192K 44K 2
6 Stack 8192K 48K 2
11 Stack 8192K 52K 2
7 Stack 8192K 56K 2
8 Stack 8192K 60K 2
2 Stack 8192K 64K 2
2 Stack 8192K 68K 2
4 Stack 8192K 72K 2
1 Stack 8192K 76K 2
2 Stack 8192K 108K 2
1 Stack 8192K 384K 2
1 Stack 8192K 2056K 2

(The "virtual" stack size seems to always be the same as ulimit -s,
ie 8MB by default, on this platform.) This is good confirmation
that the actual stack consumption is pretty stable across different
compilers, though it looks like OS X's version of clang is a bit
more stack-wasteful for the regex recursion.

Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP
to 256KB on x86_64. It would sure be good to check things on some
other architectures, though ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2016-07-07 17:26:42 Re: EXPLAIN ANALYZE for parallel query doesn't report the SortMethod information.
Previous Message Pete Stevenson 2016-07-07 16:45:01 MVCC overheads