Re: Out of swap space & memory

From: "Kevin Bartz" <kbartz(at)loyaltymatrix(dot)com>
To: "'Lincoln Yeoh'" <lyeoh(at)pop(dot)jaring(dot)my>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Out of swap space & memory
Date: 2004-08-09 18:33:04
Message-ID: 20040809183859.774213FCA5@omta16.mta.everyone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Lincoln! Thanks for your reply. On the problematic SELECT INTO, EXPLAIN
says:

test=#
QUERY PLAN
------------------------------------------------------------------
HashAggregate (cost=42.50..42.50 rows=1000 width=356)
-> Seq Scan on hp_raw (cost=0.00..20.00 rows=1000 width=356)
(2 rows)

Correct me if I'm wrong, but I'm guessing that what I should infer from this
is that Postgres is indeed using hash aggregation. I just tried set
enable_hashagg = off, per your suggestion, and the SELECT INTO is grouping
away without blowup this time. Unfortunately, there's no end in sight. I've
let it sit and crank for an hour now, and it's still going. MS SQL Server
with as much RAM and less clock speed took six minutes for this de-duping!
For comparison, I also tried some other operations, like "select count(*)
from octanenights" and creating indexes, and in all instances Postgres
finishes in about three-fourths of the time SQL Server does. So why is this
SELECT INTO taking so much more time in comparison? Is there any other
option I can set to make it move a little faster? Thanks for all your help.

Kevin

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Lincoln Yeoh
Sent: Friday, August 06, 2004 9:25 PM
To: Kevin Bartz; 'Manfred Koizar'; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Out of swap space & memory

What does EXPLAIN <your query here> say?

I had an issue[1] with 7.4.1 where it was using "HashAggregate" because it
thought everything could fit in memory, but it couldn't.

So I downgraded to 7.3.4 and it worked.

In my case Tom Lane suggested a workaround for the 7.4.1 planner
erroneously using HashAgregate:

set enable_hashagg = off

But without an EXPLAIN it's harder for people here to help you.

Wonder if 7.4.x could throw a warning and switch to a less memory intensive
query plan if it discovers it is running out of mem.

Hope this helps,
Link.

[1] see thread: "postmaster growing to consume all memory"

At 07:32 PM 8/6/2004 -0700, Kevin Bartz wrote:

>Mike, thanks so much for your reply. I'm sorry for not showing you my SQL.
I
>didn't show it because I couldn't manage to boil it down to something
>reproducible that everyone could try. But here's what it was:
>
>drop table octanenights;
>CREATE TABLE octanenights (member_id varchar(100), campaign_id
varchar(100),
>catalog_type varchar(100), pushed int, delivered int, clicks int, opened
>int, month varchar(100), type1 int, type2 int, type3 int, type4 int, type5
>int);
>
>copy octanenights from
>'/home/kevin/octanenights/proc/uberfile/uberfile1.txt' null as '';
>copy octanenights from
>'/home/kevin/octanenights/proc/uberfile/uberfile2.txt' null as '';
>copy octanenights from
>'/home/kevin/octanenights/proc/uberfile/uberfile3.txt' null as '';
>
>select * from octanenights limit 10;
>alter table octanenights rename to octanenights_raw;
>
>-- de-dup the table
>select member_id, campaign_id, catalog_type, pushed, delivered, clicks,
>opened, month, type1, type2, type3, type4, type5
>into octanenights
>from octanenights_raw
>group by member_id, campaign_id, catalog_type, pushed, delivered, clicks,
> opened, month, type1, type2, type3, type4, type5;
>
>Let me tell you a little about octanenights. It's a file of about
36,000,000
>rows, each describing an e-mail sent. Unfortunately, there are duplicate
>records scattered throughout the table, which I do not care about. One
might
>suggest that I could've used uniq from the command line for this, but the
>data were not sorted originally and the duplicate records may be scattered
>anywhere in the table. The objective in the final line is to de-dup the
>table and place it into octanenights, leaving the original in
>octanenights_raw in case I ever need to refer back to it.
>
>MS SQL Server, with as much RAM and less clock speed, de-dups the table in
>about six minutes. The de-duped version has about 26,000,000 rows. The
final
>line is where Postgres gobbles up all my swap and RAM and then conks out
>completely.
>
>Am I doing something wrong? Maybe there was a better way to approach this
>problem? I'd be open to suggestions of any kind, since I'm still very, very
>new to the world of optimizing Postgres.
>
>Kevin

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2004-08-09 18:47:01 Re: Losing records when server hang
Previous Message Kevin Bartz 2004-08-09 17:37:59 FW: Out of swap space & memory