From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | josh(at)agliodbs(dot)com, "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, lutzeb(at)aeccom(dot)com, pgsql-performance(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com> |
Subject: | Re: Wierd context-switching issue on Xeon |
Date: | 2004-04-20 03:00:05 |
Message-ID: | 40849235.2070808@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Tom Lane wrote:
> Here is a test case. To set up, run the "test_setup.sql" script once;
> then launch two copies of the "test_run.sql" script. (For those of
> you with more than two CPUs, see whether you need one per CPU to make
> trouble, or whether two test_runs are enough.) Check that you get a
> nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
Check.
> In isolation, test_run.sql should do essentially no syscalls at all once
> it's past the initial ramp-up. On a machine that's functioning per
> expectations, multiple copies of test_run show a relatively low rate of
> semop() calls --- a few per second, at most --- and maybe a delaying
> select() here and there.
>
> What I actually see on Josh's client's machine is a context swap storm:
> "vmstat 1" shows CS rates around 170K/sec. strace'ing the backends
> shows a corresponding rate of semop() syscalls, with a few delaying
> select()s sprinkled in. top(1) shows system CPU percent of 25-30
> and idle CPU percent of 16-20.
Your test case works perfectly. I ran 4 concurrent psql sessions, on a
quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top'
looks like:
177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 35.9% 0.0% 7.2% 0.0% 0.0% 0.0% 56.8%
cpu00 19.6% 0.0% 4.9% 0.0% 0.0% 0.0% 75.4%
cpu01 44.1% 0.0% 7.8% 0.0% 0.0% 0.0% 48.0%
cpu02 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0%
cpu03 32.3% 0.0% 13.7% 0.0% 0.0% 0.0% 53.9%
cpu04 21.5% 0.0% 10.7% 0.0% 0.0% 0.0% 67.6%
cpu05 42.1% 0.0% 9.8% 0.0% 0.0% 0.0% 48.0%
cpu06 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
cpu07 27.4% 0.0% 10.7% 0.0% 0.0% 0.0% 61.7%
Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff
2492124k actv, 760612k in_d, 41416k in_c
Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached
Note that cpu06 is not a postgres process. The output of vmstat looks
like this:
# vmstat 1
procs memory swap io system
cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
4 0 5632 184264 221948 3113308 0 0 0 0 0 0 0 0 0 0
3 0 5632 184264 221948 3113308 0 0 0 0 112 211894 36 9 55 0
5 0 5632 184264 221948 3113308 0 0 0 0 125 222071 39 8 53 0
4 0 5632 184264 221948 3113308 0 0 0 0 110 215097 39 10 52 0
1 0 5632 184588 221948 3113308 0 0 0 96 139 187561 35 10 55 0
3 0 5632 184588 221948 3113308 0 0 0 0 114 241731 38 10 52 0
3 0 5632 184920 221948 3113308 0 0 0 0 132 257168 40 9 51 0
1 0 5632 184912 221948 3113308 0 0 0 0 114 251802 38 9 54 0
> Note the test case assumes you've got shared_buffers set to at least
> 1000; with smaller values, you may get some I/O syscalls, which will
> probably skew the results.
shared_buffers
----------------
16384
(1 row)
I found that killing three of the four concurrent queries dropped
context switches to about 70,000 to 100,000. Two or more sessions brings
it up to 200K+.
Joe
From | Date | Subject | |
---|---|---|---|
Next Message | Shea,Dan [CIS] | 2004-04-20 03:37:47 | Why will vacuum not end? |
Previous Message | Tom Lane | 2004-04-20 00:53:09 | Re: Wierd context-switching issue on Xeon |