Re: Bad planning data resulting in OOM killing of postgres

From: David Hinkle <hinkle(at)cipafilter(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Bad planning data resulting in OOM killing of postgres
Date: 2017-02-16 01:55:32
Message-ID: CACw4T0oGLnAvLuxNed48SFBxaK7CV2BNjcBHyxL5aEGzR+B8sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for your help!

Karsten: The system does fill up swap before it blows up. This
particular model has 8G of ram and 4G of swap and runs kernel
4.4.0-53-generic #74~14.04.1-Ubuntu.

Tom, there are three columns in this table that exhibit the problem,
here is the statistics data after an analyze, and the real data to
compare it to.

cipafilter=# select attname, n_distinct, most_common_freqs from
pg_stats where tablename = 'log_raw' and (attname = 'urlid' or attname
= 'titleid' or attname = 'hostid');
attname | n_distinct |

most_common_freqs
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urlid | 38963 |
{0.0188,0.009,0.00853333,0.0079,0.00686667,0.0068,0.00616667,0.0061,0.00606667,0.005,0.0049,0.00483333,0.00463333,0.00456667,0.0044,0.0044,0.0039,0.0032,0.00263333,0.00263333,0.00256667,0.00256667,0.00246667,0.0023,0.0023,0.00223333,0.00203333,0.002,0.002,0.002,0.0019,0.00186667,0.00183333,0.0018,0.0018,0.00176667,0.00176667,0.00176667,0.00176667,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.0016,0.00156667,0.00153333,0.0015,0.00146667,0.00143333,0.0014,0.0014,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00133333,0.0013,0.0013,0.0013,0.0013,0.00126667,0.00126667,0.00123333,0.00123333,0.00123333,0.00123333,0.0012,0.0012,0.00113333,0.0011,0.00106667,0.00106667,0.001,0.001,0.001,0.001,0.000966667,0.000966667,0.000966667,0.000966667,0.000966667,0.000933333,0.000933333,0.000933333,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.000866667,0.000866667,0.000866667,0.000866667}
hostid | 2478 |
{0.2109,0.0330667,0.0321667,0.0245,0.0223,0.0177667,0.0165,0.0152,0.0151333,0.014,0.0132,0.0115667,0.00993333,0.00963333,0.00873333,0.00853333,0.00853333,0.00746667,0.0074,0.00653333,0.0065,0.0065,0.00646667,0.00513333,0.00506667,0.00503333,0.00496667,0.00493333,0.0049,0.00486667,0.00483333,0.00453333,0.0044,0.00433333,0.00426667,0.00413333,0.00396667,0.00386667,0.00383333,0.00363333,0.00363333,0.00363333,0.00346667,0.00323333,0.00323333,0.0032,0.00316667,0.00303333,0.00303333,0.0029,0.00286667,0.00273333,0.00273333,0.00266667,0.00263333,0.00263333,0.0026,0.0026,0.0025,0.00243333,0.00243333,0.0024,0.00236667,0.00226667,0.00223333,0.00216667,0.00213333,0.0021,0.0021,0.00206667,0.00206667,0.00203333,0.002,0.002,0.00193333,0.0019,0.0019,0.00186667,0.00186667,0.00186667,0.0018,0.0018,0.0018,0.00176667,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00166667,0.00163333,0.00163333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667}
titleid | 292 | {0.767167}
(3 rows)

I have to patch the pg_stats table to get postgres to run the
following queries without crashing:

cipafilter=# UPDATE pg_statistic AS s
cipafilter-# SET stadistinct = (select reltuples from pg_class
where relname = 'titles')
cipafilter-# FROM pg_class c, pg_attribute a where c.oid =
s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
cipafilter-# relname = 'log_raw' and attname = 'titleid';
UPDATE 1
cipafilter=# UPDATE pg_statistic AS s
cipafilter-# SET stadistinct = (select reltuples from pg_class
where relname = 'urls')
cipafilter-# FROM pg_class c, pg_attribute a where c.oid =
s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
cipafilter-# relname = 'log_raw' and attname = 'urlid';
UPDATE 1
cipafilter=# UPDATE pg_statistic AS s
cipafilter-# SET stadistinct = (select reltuples from pg_class
where relname = 'hosts')
cipafilter-# FROM pg_class c, pg_attribute a where c.oid =
s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
cipafilter-# relname = 'log_raw' and attname = 'hostid';
UPDATE 1

cipafilter=# select attname, n_distinct from pg_stats where tablename
= 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname =
'hostid');
attname | n_distinct
---------+-------------
urlid | 1.51452e+08
hostid | 303756
titleid | 879485

cipafilter=# select titleid, count(titleid) from log_raw group by
titleid order by count(titleid) desc limit 10;
titleid | count
---------+-----------
1 | 423848049
49547 | 403432
238 | 188151
12 | 151640
5 | 149524
6196 | 139445
32014 | 123828
200 | 88682
58921 | 86451
10 | 84264
(10 rows)

cipafilter=# select urlid, count(urlid) from log_raw group by urlid
order by count(urlid) desc limit 10;
urlid | count
--------+----------
129991 | 10843088
1 | 4953757
21 | 4345503
2765 | 4266981
12 | 3654127
920 | 3609054
1135 | 3562185
20 | 3495023
283567 | 3019675
2494 | 2655301
(10 rows)

cipafilter=# select hostid, count(hostid) from log_raw group by hostid
order by count(hostid) desc limit 10;
hostid | count
--------+-----------
7 | 117342686
5 | 18016481
53 | 17408992
57 | 12947564
543 | 12698269
1 | 10068246
127544 | 8746204
27 | 8618595
40 | 8507278
36 | 7424412
(10 rows)

Fun fact, hostid 7 is 'google.com'.

On Wed, Feb 15, 2017 at 6:11 AM, Karsten Hilbert
<Karsten(dot)Hilbert(at)gmx(dot)net> wrote:
> On Wed, Feb 15, 2017 at 01:04:51PM +0100, Karsten Hilbert wrote:
>
>> > Nope, that pops too. The query runs for a long time at a somewhat
>> > normal rate of ram consumption, using ~1G of RSS then suddenly spikes
>> > to about 6G, at which point the OOM killer pops it. Box has 8G of ram
>> > and 4G of swap.
>>
>> By any chance:
>>
>> - when it happens has the kernel considered using swap ?
>>
>> - which kernel are you running ?
>>
>> There's been (for some workloads) massive problems with RAM
>> exhaustion / swapping / OOM killer going wild with
>> 4.7/4.8/some 4.9 kernels.
>
> I guess what I'm trying to say is that it may actually not be
> PostgreSQL's fault but rather the kernel invoking the OOM
> killer way prematurely.
>
> Karsten
> --
> GPG key ID E4071346 @ eu.pool.sks-keyservers.net
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Hinkle

Senior Software Developer

Phone: 800.243.3729x3000

Email: hinkle(at)cipafilter(dot)com

Hours: Mon-Fri 8:00AM-5:00PM (CT)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Hinkle 2017-02-16 01:57:12 Re: Bad planning data resulting in OOM killing of postgres
Previous Message Tom Lane 2017-02-16 01:08:37 Re: How to evaluate "explain analyze" correctly after "explain" for the same statement ?