Re: very, very slow performance

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: "Uwe C(dot) Schroeder" <uwe(at)oss4u(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org, Tena Sakai <tsakai(at)gallo(dot)ucsf(dot)edu>
Subject: Re: very, very slow performance
Date: 2009-02-21 08:57:56
Message-ID: dcc563d10902210057k329f93bat76d07ba0016a24fe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Fri, Feb 20, 2009 at 10:45 PM, Uwe C. Schroeder <uwe(at)oss4u(dot)com> wrote:
>
> On Friday 20 February 2009, Tena Sakai wrote:
>> Hi Scott,
>>
>> > What does explain and (it'll take a while to get
>> > it) explain analyze select ... have to say?
>>
>> ---------------------------------------------------------------------------
>>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
>> Hash Cond: (genotype.allele1id = a1.alleleid)
>> -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
>> Hash Cond: (genotype.allele2id = a2.alleleid)
>> -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
>> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
>> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
>> (cost=42474.59..42474.59 rows=2447659 width=6)
>> -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
>> width=6) (9 rows)

I was wrong about this query, it is constrained by the where clause.
I much prefer join on syntax as it's more obvious what's joining to
what. Pgsql is smart enough to reorder join clauses as long as it's
not contrained in by say, a left join, and even then there's some
wiggle room I think. Anyway...

> The above tells you that you don't have indices in place. Postgres chooses a
> seq scan - which as the name implies scans all the rows in sequencial order.

Well, he's grabbing everything, so I'm betting an index won't buy you
anything unless everything fits in memory and you set random_page_cost
low enough and shared_buffers and effective_cache high enough, then an
index will lose.

However, if you always access the tables in a given order, you can
cluster tables and get really fast results. I'd try clustering on an
index for each sub table, clustering on that, and adding order bys to
put the result sets into matching clustered index fields for each
joined table.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2009-02-21 09:40:43 Re: 8.3.5 broken after power fail SOLVED
Previous Message Michael Monnerie 2009-02-21 08:43:35 Question on rule