Re: Searching for Duplicates and Hosed the System

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Thoen <bthoen(at)gisnet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Searching for Duplicates and Hosed the System
Date: 2007-08-20 03:30:24
Message-ID: 2663.1187580624@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bill Thoen <bthoen(at)gisnet(dot)com> writes:
> Tom, here's the "explain" results: Does this help explain what went wrong?
> (And yes, I think there will be a *lot* of groups.)

> explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
> tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
> compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
> field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

> QUERY PLAN
> --------------------------------------------------------
> Sort (cost=15119390.46..15123902.54 rows=1804832 width=160)
> Sort Key: count(*)
> -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832
> width=160)
> -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160)
> Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
> field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
> -> Seq Scan on compliance_2006 (cost=0.00..1039927.18
> rows=18048318 width=160)
> (6 rows)

Hmm ... no, actually, that shows the planner doing the right thing for
lotsa groups: picking GroupAggregate instead of HashAggregate. The
estimated number of groups is 1804832, which might or might not have
much to do with reality but in any case seems enough to keep it away
from HashAggregate.

Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE
or VACUUM ANALYZE? The only theory I can think of at this point is that
your database statistics are more correct now than they were when you
had the problem.

If you try the query again, does it behave more sanely?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robin Helgelin 2007-08-20 06:25:37 Re: entry log
Previous Message Mike Rylander 2007-08-20 01:05:17 Re: tsearch2: plainto_tsquery() with OR?