Re: extremly bad select performance on huge table

From: Björn Wittich <Bjoern_Wittich(at)gmx(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: extremly bad select performance on huge table
Date: 2014-10-21 19:32:03
Message-ID: 5446B4B3.4030802@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom and Igor,

thanks for your help. With the reindex the select query running time was
reduced from 5200 sec to 130 sec. Impressive!

Even a join on this table is now fast.

Unfortunately, there is now another problem: The table in my example has
500 columns which I want to retrieve with my join command.

Example which is fast "select value from smallertable inner join
myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"

Example which is slow "select value,c1,c2,c3,...,c10 from smallertable
inner join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn"

Which is the number of columns to fetch so bad ? Which action is done in
the db system when querying this via pgadmin? I think that there is no
real retrieval included, why is the number of additional columns so bad
for the join performance?

> =?ISO-8859-15?Q?Bj=F6rn_Wittich?= <Bjoern_Wittich(at)gmx(dot)de> writes:
>> Here is the explain (analyze,buffers) select mycolumn from myhugetable
>> "Index Only Scan using myprimkey on myhugetable (cost=0.00..8224444.82
>> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999
>> loops=1)"
>> " Heap Fetches: 356861"
>> " Buffers: shared hit=71799472 read=613813"
>> "Total runtime: 2503009.611 ms"
> So that works out to about 4 msec per page fetched considering only I/O
> costs, which is about as good as you're likely to get if the data is
> sitting on spinning rust.
>
> You could potentially make it faster with a VACUUM (to mark all pages
> all-visible and eliminate the "heap fetches" costs), or a REINDEX
> (so that the index scan becomes more nearly sequential instead of random
> access). However, unless the data is nearly static those will just be
> temporary fixes: the time will degrade again as you update the table.
>
>> Note: This select is just for testing. My final statement will be a join
>> on this table via the "mycolumn" column.
> In that case it's probably a waste of time to worry about the performance
> of this query as such. In the first place, a join is not likely to use
> the index at all unless it's fetching a relatively small number of rows,
> and in the second place it seems unlikely that the join query can use
> an IndexOnlyScan on this index --- I imagine that the purpose of the join
> will require fetching additional columns.
>
> regards, tom lane
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Igor Neyman 2014-10-21 19:53:48 Re: extremly bad select performance on huge table
Previous Message Tom Lane 2014-10-21 19:07:42 Re: extremly bad select performance on huge table