Query memory usage greatly in excess of work_mem * query plan steps

From: Timothy Garnett <tgarnett(at)panjiva(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Query memory usage greatly in excess of work_mem * query plan steps
Date: 2014-06-11 22:02:55
Message-ID: CAPcyiQ1ZMjMCqCBrbbNRpNW9nbCsneAPJsyWwr+bxi25cGA0tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I have a query that's pulling data for another system using COPY (query) to
STDOUT CSV on a 9.2.4 db (we're in the process of upgrading to 9.3). The
final csv file is large (~75GB, 86 million rows). The query is also large,
consisting of one table (86 million rows) left joined to a total of 30
other tables (of mixed size), 3 of which are CTE supplied by a WITH clause
of and consist of 3 joins each for a total of 39 joins in the plan.
work_mem on the system is set to 256MB.

We're running into problems with the machine running out of memory with
this single query process consuming over 100GB resident memory before the
machine exhausts swap and the Linux OOM handling eventually kills it. The
query plan from explain comes to 186 rows, which assuming that each row
requires the full work_mem (which should be a significant overestimate of
the number operations and size) is < 50GB and we're observing substantially
more then that. Is it reasonable to expect that a query will take ~ <
work_mem * # of operations, or are there other factors in play?

The plan looks reasonable (though there are some odd right join uses, see
below) and the row estimates look pretty accurate with the exception that
one of the CTE queries is under-estimated row count wise by a little over 2
orders of magnitude (260k vs. 86 million rows). That query does a group by
(plans as a sort then group aggregate, there are no hash aggregates in the
plan which is something that might increase memory) and the group part
miss-estimates the final number of rows for that CTE. Unlike the other CTEs
when it's merged joined into the main query there's no materialize line in
the plan (no idea if that's relevant).

As to the right join (used for a few of the joins, most are left join or
merge):
-> Hash Right Join (cost=225541299.19..237399743.38
rows=86681834 width=1108)
Hash Cond: (xxx.xxx = yyy.yyy)
-> Seq Scan on xxx (cost=0.00..6188.18
rows=9941 width=20)
Filter: (mode = 'live'::text)
-> Hash (cost=212606744.27..212606744.27
rows=86681834 width=1096)
....
I'm not sure if I'm reading it right, but it looks like it's hashing the 86
million row set and scanning over the 10k row set which seems to me like
the opposite of what you'd want to do, but I haven't seen a lot of hash
right joins in plans and I'm not sure if that's how it works.

Tim

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Silverlock 2014-06-12 07:08:27 OFFSET/LIMIT - Disparate Performance w/ Go application
Previous Message tim_wilson 2014-06-10 22:41:25 Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0