Re: Bad planning data resulting in OOM killing of postgres

From: David Hinkle <hinkle(at)cipafilter(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Bad planning data resulting in OOM killing of postgres
Date: 2017-02-16 23:27:50
Message-ID: CACw4T0ocdjHM4TCxzFQR6cyo6YK4rQADAuR=Ep+odBBUswAwaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I guess this doesn't work, latest test run crashed. It still uses the
bad plan for the hostid column even after n_distinct is updated.

cipafilter=# select attname, n_distinct from pg_stats where tablename
cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
or attname =
cipafilter(# 'hostid');
attname | n_distinct
---------+-------------
urlid | 1.51625e+08
hostid | 304710
titleid | 886499
(3 rows)

cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
QUERY PLAN
-----------------------------------------------------------------------------------------
Delete on hosts (cost=22249475.67..74844813.47 rows=1 width=12)
-> Hash Anti Join (cost=22249475.67..74844813.47 rows=1 width=12)
Hash Cond: (hosts.hostid = log_raw.hostid)
-> Seq Scan on hosts (cost=0.00..5017.10 rows=304710 width=10)
-> Hash (cost=12799395.52..12799395.52 rows=543645052 width=10)
-> Seq Scan on log_raw (cost=0.00..12799395.52
rows=543645052 width=10)
(6 rows)

I guess I will also try throwing in 'set enable_hashjoin = false;' and
see if that gets these purges to go.

On Thu, Feb 16, 2017 at 2:22 PM, David Hinkle <hinkle(at)cipafilter(dot)com> wrote:
> Yep, 420ish million records out of 540 million records have a titleid
> of 1. There are about 880,000 other unique values, but most of the
> records are 1. Of course, n_distinct is only 292. I'm surprised
> it's not eliminating the duplicates while it builds that hash table.
>
> This is what I'm doing for a work around right now. Getting
> n_distinct right seems to be preventing the system from breaking.
> It's going to be executed once a week during the weekly maintenance.
> It's setting the n_distinct of each column to the number of rows in
> the associated table.
>
> CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column
> varchar, _string_table varchar)
> RETURNS real AS
> $$
> DECLARE _cnt REAL;
> BEGIN
> SELECT reltuples INTO _cnt from pg_class where relname = _string_table;
> EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column
> || ' SET (n_distinct=' || _cnt || ')';
> RETURN _cnt;
> END
> $$ LANGUAGE plpgsql;
> select patch_ndistinct('log_raw', 'titleid', 'titles');
> select patch_ndistinct('log_raw', 'urlid', 'urls');
> select patch_ndistinct('log_raw', 'hostid', 'hosts');
> ANALYZE log_raw;
>
> On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> David Hinkle <hinkle(at)cipafilter(dot)com> writes:
>>> Tom, there are three columns in this table that exhibit the problem,
>>> here is the statistics data after an analyze, and the real data to
>>> compare it to.
>>
>>> attname | n_distinct | most_common_freqs
>>
>>> titleid | 292 | {0.767167}
>>
>> Ouch. That's saying there's some single value of titleid that accounts
>> for more than three-quarters of the entries ... does that square with
>> reality? That'd certainly explain why a hash join goes nuts.
>>
>> regards, tom lane
>
>
>
> --
> David Hinkle
>
> Senior Software Developer
>
> Phone: 800.243.3729x3000
>
> Email: hinkle(at)cipafilter(dot)com
>
> Hours: Mon-Fri 8:00AM-5:00PM (CT)

--
David Hinkle

Senior Software Developer

Phone: 800.243.3729x3000

Email: hinkle(at)cipafilter(dot)com

Hours: Mon-Fri 8:00AM-5:00PM (CT)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Brosnahan 2017-02-17 00:39:14 PostgreSQL mirroring from RPM install to RPM install-revisited
Previous Message Alvaro Herrera 2017-02-16 22:49:32 Re: Autovacuum stuck for hours, blocking queries