Re: Query memory usage greatly in excess of work_mem * query plan steps

From: "Franklin, Dan" <dan(dot)franklin(at)pearson(dot)com>
To: Timothy Garnett <tgarnett(at)panjiva(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query memory usage greatly in excess of work_mem * query plan steps
Date: 2014-06-14 02:18:45
Message-ID: CAKWqM3XOhav1KrBif5G9bW-UbLFpYp7DstCA=GJ+dt25H_cFJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We had a problem in the 8.X series with COPY IN - it did not respect any
configured maximums and just kept allocating memory until it could fit the
entire COPY contents down to the \. into RAM. Could there be a similar
issue with COPY OUT?

-----
Dan

On Wed, Jun 11, 2014 at 6:02 PM, Timothy Garnett <tgarnett(at)panjiva(dot)com>
wrote:

> Hi,
>
> I have a query that's pulling data for another system using COPY (query)
> to STDOUT CSV on a 9.2.4 db (we're in the process of upgrading to 9.3).
> The final csv file is large (~75GB, 86 million rows). The query is also
> large, consisting of one table (86 million rows) left joined to a total of
> 30 other tables (of mixed size), 3 of which are CTE supplied by a WITH
> clause of and consist of 3 joins each for a total of 39 joins in the plan.
> work_mem on the system is set to 256MB.
>
> We're running into problems with the machine running out of memory with
> this single query process consuming over 100GB resident memory before the
> machine exhausts swap and the Linux OOM handling eventually kills it. The
> query plan from explain comes to 186 rows, which assuming that each row
> requires the full work_mem (which should be a significant overestimate of
> the number operations and size) is < 50GB and we're observing substantially
> more then that. Is it reasonable to expect that a query will take ~ <
> work_mem * # of operations, or are there other factors in play?
>
> The plan looks reasonable (though there are some odd right join uses, see
> below) and the row estimates look pretty accurate with the exception that
> one of the CTE queries is under-estimated row count wise by a little over 2
> orders of magnitude (260k vs. 86 million rows). That query does a group by
> (plans as a sort then group aggregate, there are no hash aggregates in the
> plan which is something that might increase memory) and the group part
> miss-estimates the final number of rows for that CTE. Unlike the other CTEs
> when it's merged joined into the main query there's no materialize line in
> the plan (no idea if that's relevant).
>
> As to the right join (used for a few of the joins, most are left join or
> merge):
> -> Hash Right Join (cost=225541299.19..237399743.38
> rows=86681834 width=1108)
> Hash Cond: (xxx.xxx = yyy.yyy)
> -> Seq Scan on xxx (cost=0.00..6188.18
> rows=9941 width=20)
> Filter: (mode = 'live'::text)
> -> Hash (cost=212606744.27..212606744.27
> rows=86681834 width=1096)
> ....
> I'm not sure if I'm reading it right, but it looks like it's hashing the
> 86 million row set and scanning over the 10k row set which seems to me like
> the opposite of what you'd want to do, but I haven't seen a lot of hash
> right joins in plans and I'm not sure if that's how it works.
>
> Tim
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-06-14 14:26:14 Re: Query memory usage greatly in excess of work_mem * query plan steps
Previous Message Shaun Thomas 2014-06-13 16:11:41 Re: postgres files in use not staying in linux file cache