Re: Bad planning data resulting in OOM killing of postgres

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Hinkle <hinkle(at)cipafilter(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Bad planning data resulting in OOM killing of postgres
Date: 2017-02-13 17:54:10
Message-ID: CAFj8pRCQ+kiqyUU9Zh8UnAoSt9n0dLY0OgrszyBk9sj3ByiKVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

2017-02-13 18:40 GMT+01:00 David Hinkle <hinkle(at)cipafilter(dot)com>:

> I'm having trouble with purges related to a large table. The delete
> query consumes ram until postgres crashes due to OOM. I have a very
> large table called log_raw. There are half a dozen related tables,
> such as 'urls' and 'titles'. log_raw.urlid = urls.urlid and urls
> contains the text of the various urls, for example.
>
> Each time I try to purge these side tables the unit OOM's.
>
> psql:postgres(at)cipafilter = explain DELETE FROM titles WHERE NOT EXISTS
> ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid );
> QUERY PLAN
> ────────────────────────────────────────────────────────────
> ─────────────────────────────
> Delete on titles (cost=22166473.44..24850954.67 rows=870382 width=12)
> -> Hash Anti Join (cost=22166473.44..24850954.67 rows=870382 width=12)
> Hash Cond: (titles.titleid = log_raw.titleid)
> -> Seq Scan on titles (cost=0.00..17871.64 rows=870664 width=10)
> -> Hash (cost=12744792.64..12744792.64 rows=542011264 width=10)
> -> Seq Scan on log_raw (cost=0.00..12744792.64
> rows=542011264 width=10)
> (6 rows)
>
>
what is your work_mem setting?

Regards

Pavel

> psql:postgres(at)cipafilter = select count(*) from (select titleid from
> log_raw group by titleid) as a;
> count
> ────────
> 872210
> (1 row)
>
> cipafilter=# select n_distinct from pg_stats where tablename =
> 'log_raw' and attname = 'titleid';
> n_distinct
> ------------
> 282
> (1 row)
>
> The planning data is wildly low for each of these fields, and I wonder
> if because of that error the planner thinks it can keep all these id's
> in ram while it works. Analyze doesn't fix it. Increasing the
> statistics target improves the data in n_distinct but not
> considerably, as increasing it 3 or 4 fold leads to it still being
> wildly off. ALTER TABLE set n_distinct doesn't seem to be used by the
> planner as it doesn't change any of the plans I've generated or seem
> to be taken into account in the row estimates. I'm out of ideas.
> Anybody have any ideas?
>
>
> --
> David Hinkle
>
> Senior Software Developer
>
> Phone: 800.243.3729x3000
>
> Email: hinkle(at)cipafilter(dot)com
>
> Hours: Mon-Fri 8:00AM-5:00PM (CT)
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2017-02-13 17:59:17 Re: Auto-Rollback option
Previous Message David Hinkle 2017-02-13 17:40:28 Bad planning data resulting in OOM killing of postgres