Re: Performance problems with DISTINCT ON

From: "Sgarbossa Domenico" <domenico(dot)sgarbossa(at)eniac(dot)it>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance problems with DISTINCT ON
Date: 2009-09-29 12:44:49
Message-ID: 0D9FA0704ED847C089451F8B79D1EE4C@laneniac.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Subject: Re: [PERFORM] Performance problems with DISTINCT ON

> Sgarbossa Domenico wrote:
>> I need to retrieve the most recent prices per products from a price list
>> table:
>
>> select distinct on (articolo) articolo,data_ent,prezzo from
>> listini_anagrafici order by articolo, data_ent desc
>>
>> but it seems that this query runs slowly... about 5/6 seconds.
>> the table contains more or less 500K records, PostgreSQL version is
>> 8.1.11 and the server has 4gb of RAM entirely dedicate to the db.
>
>> 'Unique (cost=73893.89..76551.25 rows=88312 width=24) (actual
>> time=4022.578..5076.206 rows=193820 loops=1)'
>> ' -> Sort (cost=73893.89..75222.57 rows=531472 width=24) (actual
>> time=4022.574..4505.538 rows=531472 loops=1)'
>> ' Sort Key: articolo, data_ent'
>> ' -> Seq Scan on listini_anagrafici (cost=0.00..16603.72
>> rows=531472 width=24) (actual time=0.009..671.797 rows=531472 loops=1)'
>> 'Total runtime: 5217.452 ms'
>
> You've got 531472 rows in the table and the query is going to output
> 193820 of them. Scanning the whole table is almost certainly the way to
> go.
>
> If the table doesn't change much, you could try running a CLUSTER on the
> index you've created. That will lock the table while it re-orders the
> physical layout of the rows based on your index though, so it's no good
> if the table is updated much.
>
> Failing that, you could try issuing "set work_mem = ..." before the
> query with increasing sizes for work_mem. That might make the sort
> faster too.
>

Thank you for the answer,
I've tried as you suggest but the only things that seems make some
differences is the work_mem parameter
This helps to reduce the amount of time about for the half (3 seconds) but
unfortunately this ain't enough.
If there are a lot of concurrent request I think it could made the data
swap to the disk.
Should I try a different approach to solve this issue?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message hubert depesz lubaczewski 2009-09-29 12:53:13 Re: Performance problems with DISTINCT ON
Previous Message Reydan Cankur 2009-09-29 11:07:44 Using OProfile