Re: swap storm created by 8.2.3

From: Richard Huxton <dev(at)archonet(dot)com>
To: Joseph Shraibman <jks(at)selectacast(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: swap storm created by 8.2.3
Date: 2007-05-25 19:04:07
Message-ID: 46573327.3070803@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Joseph Shraibman wrote:
>
>
> Richard Huxton wrote:
>> Joseph Shraibman wrote:
>>>>> I ran a query that was "SELECT field, count(*) INTO TEMP temptable"
>>>>> and it grew to be 10gig (as reported by top)
>>>>
>>>> What was the real query?
>>>
>>> First I selected 90634 rows (3 ints) into the first temp table, then
>>> I did "select intfield1, count(intfield2) FROM realtable rt WHERE
>>> rt.id = temptable.id and other conditions on rt here GROUP BY
>>> intfield1". The size of the second temp table should have been no
>>> more than 60000 rows.
>>
> <SNIP>
>>
>> If your memory settings in postgresql.conf are reasonable (and they
>> look fine), this shouldn't happen. Let's see if an EXPLAIN sheds any
>> light.
>>
> => explain SELECT ml.uid, count(ml.jid) AS cnt INTO TEMP tempml FROM ml
> WHERE ml.jid = tempjr1180108653561.id AND ml.status IN(2,5,20) GROUP BY
> ml.uid;

> NOTICE: adding missing FROM-clause entry for table "tempjr1180108653561"

I'm guessing this is just a typo from your test and you'd normally
mention the temp-table.

> LINE 2: ...INTO TEMP tempml FROM ml WHERE ml.jid = tempjr1180...
> ^
> QUERY PLAN
> ------------------------------------------------------------------------------------------
>
> HashAggregate (cost=11960837.72..11967601.06 rows=541067 width=8)
> -> Hash Join (cost=9675074.94..11849780.55 rows=22211434 width=8)

Here you seem to have 22 million rows estimated for your join.

> Hash Cond: (tempjr1180108653561.id = ml.jid)
> -> Seq Scan on tempjr1180108653561 (cost=0.00..31.40
> rows=2140 width=4)

Is the 2140 rows here a good estimate?

> -> Hash (cost=6511767.18..6511767.18 rows=181979021 width=8)
> -> Seq Scan on ml (cost=0.00..6511767.18 rows=181979021
> width=8)

OK, so the 22 million matches is because "ml" has 181 million rows. Is
that right too?

> Filter: (status = ANY ('{2,5,20}'::integer[]))

Overall it's estimating about 9 times the number of rows you were
expecting (541000 vs 60000). Not enough to account for your extreme
memory usage.

Let's see if that hash-join is really the culprit. Can you run EXPLAIN
and then EXPLAIN ANALYSE on the query, but first issue:

SET enable_hashjoin=off;

If that make little difference, try the same with enable_hashagg.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-05-25 19:40:14 Re: swap storm created by 8.2.3
Previous Message Ron Johnson 2007-05-25 18:41:51 Re: 2 instance of postgres service running against same db files?