Re: OOM-killer issue with a specific query 9 of 20)

From: nabble(dot)30(dot)miller_2555(at)spamgourmet(dot)com
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: OOM-killer issue with a specific query 9 of 20)
Date: 2011-12-20 16:46:02
Message-ID: CABLpH8zUZroESqpe+fiJ4pvtEgm5sHehUHRJkDGbRRp9Fk9hTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Dec 20, 2011 at 8:24 AM, Scott Marlowe -
scott(dot)marlowe(at)gmail(dot)com
<+nabble+miller_2555+3b65e832a3(dot)scott(dot)marlowe#gmail(dot)com(at)spamgourmet(dot)com>
wrote:
>
> On Mon, Dec 19, 2011 at 8:52 AM,  <nabble(dot)30(dot)miller_2555(at)spamgourmet(dot)com> wrote:
> > I can probably fix by making the following sysctl adjustments:
> > vm.overcommit_memory = 2
> > vm.overcommit_ratio = 0

FYI - for the sake of others visiting this post, disabling the OS
memory overcommit does not appear an easy solution in my case as the
box fails to bootstrap due to insufficient memory.

> > Under steady-state conditions, the following shows the virtual memory size
> > for postgres backend processes:
> >      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >     8506 postgres  20   0 2327m 3084 1792 S  0.0  0.0   0:00.33 postgres
> >     8504 postgres  20   0 2326m  14m  13m S  0.0  0.2   0:01.32 postgres
> >     8505 postgres  20   0 2326m  728  452 S  0.0  0.0   0:00.91 postgres
> >     3582 postgres  20   0 2325m  54m  53m S  0.0  0.7   0:02.03 postgres
>
> FYI, this is not swap usage.  VIRT is all the memory a process has
> handles open to everywhere, including libs that it's linked against
> that might not even be loaded.  Generally speaking, VIRT is close to
> worthless for troubleshooting.
>
Thanks - I misunderstood the meaning of VIRT (its been ahwile since
I've consulted the man page).

> > My current relevant postgresql.conf settings are the following:
> >     shared_buffers = 2100MB
> >     temp_buffers = 8MB
> >     work_mem = 32MB
> >     maintenance_work_mem = 16MB
> >     max_stack_depth = 2MB
> >     constraint_exclusion = partition
>
> What's max_connections?
>
max_connections=20. As a sidenote, this is a development box and there
are no other active connections to the database while this test case
was run.

> > When executing the query, I've been watching the "top" activity, sorted by
> > resident memory. Upon execution, no other processes appear to take
> > additional resident memory, except a postgres backend process servicing the
> > query, which goes to +6Gb (triggering the OOM-killer). Given the settings in
> > postgresql.conf, and my anecdotal understanding of Postgres memory
> > management functions, I am uncertain why Postgres exhausts physical memory
> > instead of swapping to temporary files.
>
> > EXPLAIN ANALYZE output:
> >     Note: could not produce output for exact query due to OOM-killer, but
> > ran query by limiting the subquery to the first 50 results. The planner
> > iterates over all partitions, but only the first two partitions are noted
> > for brevity.
>
> This may be one instance where the regular explain will be more
> useful.  it's quite likely that the query changes when there is no
> limit.  If you compare what explain for the full query says, and what
> explain (analyze) for the abridged one says, the part that's causing
> you to run out of memory may be more obvious.
>
I've run EXPLAIN on the query, but AFAICS the query plan does not
appear significantly different than the abridged version for this
particular query (output attached below). In an effort to analyze the
base case, I re-ran the query (without LIMIT) for a selected partition
of tableA and tableB (both tables are partitioned by "Date" and the
"Date" column on each partition of tableB references the "Date" column
of the corresponding partition of tableA as a foreign key constraint).
The tableA partition holds 82,939 records (record width is 108 bytes,
per EXPLAIN) and the tableB partition holds 13,718 records (record
width is 312 bytes, per EXPLAIN) For a single table partition, `top`
shows the following resource usage of running postmaster processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1818 postgres 20 0 2325m 56m 55m S 0.0 0.7 0:32.10 postmaster
2810 postgres 20 0 156m 1044 420 S 0.0 0.0 0:01.50 postmaster
2813 postgres 20 0 2326m 256m 255m S 0.0 3.2 0:09.61 postmaster
2814 postgres 20 0 2326m 2220 1592 S 0.0 0.0 0:04.30 postmaster
2815 postgres 20 0 2327m 3996 2148 S 0.0 0.0 0:00.66 postmaster
2816 postgres 20 0 156m 1272 504 S 0.0 0.0 0:09.14 postmaster
29661 postgres 20 0 2335m 49m 40m S 0.0 0.6 0:00.24 postmaster

While I could run the query partition-by-partition, I'd still like to
be able to run a full query across all partitions.

EXPLAIN output excerpt:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Sort (cost=3125664.13..3130782.83 rows=2047480 width=55)
Sort Key: A."Date"
-> Result (cost=11553.15..2856046.51 rows=2047480 width=55)
-> Append (cost=11553.15..2856046.51 rows=2047480 width=55)
-> Seq Scan on tableA A (cost=11553.15..11571.02
rows=1 width=44)
Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Result (cost=0.00..10357.52 rows=478252 width=8)
-> Append (cost=0.00..10357.52
rows=478252 width=8)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201201
tableB (cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201112
tableB (cost=0.00..251.25 rows=12125 width=8)
-> Seq Scan on tableB_201111
tableB (cost=0.00..604.89 rows=29189 width=8)
-> Seq Scan on tableB_201110
tableB (cost=0.00..490.30 rows=23630 width=8)
-> ...
-> Seq Scan on tableA_201201 A
(cost=11553.15..11571.02 rows=1 width=44)
Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Result (cost=0.00..10357.52 rows=478252 width=8)
-> Append (cost=0.00..10357.52
rows=478252 width=8)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201201
tableB (cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201112
tableB (cost=0.00..251.25 rows=12125 width=8)
-> Seq Scan on tableB_201111
tableB (cost=0.00..604.89 rows=29189 width=8)
-> Seq Scan on tableB_201110
tableB (cost=0.00..490.30 rows=23630 width=8)
-> ...
-> Seq Scan on tableA_201112 A
(cost=11553.15..12346.63 rows=5980 width=55)
Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Result (cost=0.00..10357.52 rows=478252 width=8)
-> Append (cost=0.00..10357.52
rows=478252 width=8)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201201
tableB (cost=0.00..15.30 rows=530 width=8)
-> Seq Scan on tableB_201112
tableB (cost=0.00..251.25 rows=12125 width=8)
-> Seq Scan on tableB_201111
tableB (cost=0.00..604.89 rows=29189 width=8)
-> Seq Scan on tableB_201110
tableB (cost=0.00..490.30 rows=23630 width=8)
-> ...
-> ...
(23112 rows)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message nabble.30.miller_2555 2011-12-20 18:33:10 Re: OOM-killer issue with a specific query 11 of 20)
Previous Message Scott Marlowe 2011-12-20 13:24:12 Re: OOM-killer issue with a specific query