Re: Searching for Duplicates and Hosed the System

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Thoen <bthoen(at)gisnet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Searching for Duplicates and Hosed the System
Date: 2007-08-21 05:04:55
Message-ID: 21792.1187672695@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bill Thoen <bthoen(at)gisnet(dot)com> writes:
> I ran VACUUM ANALYZE just before I launched this and there were no other
> postgress jobs running. I'm the only user as well. I also ran EXPLAIN
> prior to the run and got this:

> Nested Loop (cost=11.71..28800.34 rows=7219 width=584)
> -> Seq Scan on ers_regions e (cost=0.00..71.90 rows=16 width=28)
> Filter: (region = 1)
> -> Bitmap Heap Scan on compliance_2006 a (cost=11.71..1788.76
> rows=451 width=584)
> Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND
> (a.fips_cnty_cd = "outer".fips_cou))
> -> Bitmap Index Scan on key_tract (cost=0.00..11.71 rows=451
> width=0)
> Index Cond: ((a.fips_st_cd = "outer".fips_st) AND
> (a.fips_cnty_cd = "outer".fips_cou))
> (7 rows)

Do those estimated row counts look sane to you --- in particular the
estimate of 7219 rows out of the whole query?

AFAICS a plan of this shape simply cannot consume a huge amount of
memory on the server, no matter how badly off the rowcount estimates
are. However, if it sends enough rows to the client, the *client* side
could be having a problem absorbing the data. libpq is designed to
try to absorb the full result rowset --- the API it provides is not
amenable to partial result sets.

> I looked in the log and saw this:
> LOG: transaction ID wrap limit is 1073746500, limited by database
> "postgres"
> LOG: transaction ID wrap limit is 1073746500, limited by database
> "postgres"
> LOG: could not send data to client: Broken pipe

Those first two messages are unrelated --- they are just routine
autovacuum output. The third one says that a client process crashed.
So now I'm thinking that the memory-overrun problem was on the client
side.

If you need to deal with very large result sets, the standard advice
is to use a cursor so you can pull a few hundred or thousand rows
at a time via FETCH.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Trinath Somanchi 2007-08-21 08:46:36 Pgcluster 1.7 Fail safe !!!
Previous Message Pavel Stehule 2007-08-21 05:02:04 Re: Automated testing of functions