Re: Allow sorts to use more available memory

From: Robert Schnabel <schnabelr(at)missouri(dot)edu>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Allow sorts to use more available memory
Date: 2011-09-12 22:21:58
Message-ID: 4E6E8606.9070201@missouri.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 9/12/2011 1:57 PM, Andy Colson wrote:
> On 9/12/2011 1:22 PM, Robert Schnabel wrote:
>> On 9/12/2011 12:57 PM, Shaun Thomas wrote:
>>> On 09/12/2011 12:47 PM, Andy Colson wrote:
>>>
>>>> work_mem is not the total a query can use. I believe each step can
>>>> use that much, and each backend can use it for multiple bits. So if
>>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>>>> 8GB.
>>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
>>> examine it. Chances are, that one query has one or more sorts. Each one
>>> of those gets its own work_mem. Each sort. The query have four sorts? It
>>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
>>> 16MB on a test system. During a load test, the machine ran out of
>>> memory, swapped out, and finally crashed after the OOM killer went nuts.
>>>
>>> Set this value *at your own risk* and only after *significant* testing.
>>> Having it too high can have rather unexpected consequences. Setting it
>>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
>>> very, very bad idea.
>>>
>> Yep, I know. But in the context of the data warehouse where *I'm the
>> only user* and I have a query that does, say 4 large sorts like
>> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
>> about using 8GB or 16GB in the case of work_mem = 4GB. I realize the
>> query above only used 1.9GB for the largest sort but I know I have other
>> queries with 1 or 2 sorts that I've watched go to disk.
>>
>> Bob
>>
>>
>>
>>
> Huge guess here, cant see select or ddl, but looks like all the tables
> are sequential scans. It might help to add an index or two, then the
> table joins could be done much more efficiently with with a lot less
> memory.
>
> -Andy
In this case I doubt it. Basically what these queries are doing is
taking table1 (~30M rows) and finding all the rows with a certain
condition. This produces ~15M rows. Then I have to find all of those
15M rows that are present in table2. In the case of the query above
this results in 1.79M rows. Basically, the 15M rows that meet the
condition for table1 have matching rows spread out over 10 different
tables (table2's).

Actually, you just gave me an idea. When I generate the "table1" I can
probably add a field that tells me which "table2" it came from for each
row that satisfies my criteria. Sometimes just having someone else make
you think is very productive. :-)

Thanks
Bob

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message david 2011-09-12 22:57:48 Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
Previous Message Robert Schnabel 2011-09-12 22:09:18 Re: Allow sorts to use more available memory