Re: Query tuning: partitioning, DISTINCT ON, and indexing

From: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
To: bricklen <bricklen(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query tuning: partitioning, DISTINCT ON, and indexing
Date: 2013-06-21 05:14:40
Message-ID: CAOtHd0DnNT=eHyR0VS6eHP7zthTgEUxN6j35nrh5z5THze7PkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jun 20, 2013 at 9:13 PM, bricklen <bricklen(at)gmail(dot)com> wrote:

>
> On Thu, Jun 20, 2013 at 6:24 PM, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>wrote:
>
>> SELECT
>> DISTINCT ON (type) ts, type, details
>> FROM
>> observations
>> WHERE
>> subject = '...'
>> ORDER BY
>> type, ts DESC;
>>
>
> First thing: What is your "work_mem" set to, and how much RAM is in the
> machine? If you look at the plan, you'll immediately notice the "external
> merge Disk" line where it spills to disk on the sort. Try setting your
> work_mem to 120MB or so (depending on how much RAM you have, # concurrent
> sessions, complexity of queries etc)
>

Good call, thanks, although the in-mem quicksort is not much faster:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=471248.30..489392.67 rows=3 width=47) (actual
time=32002.133..32817.474 rows=3 loops=1)
Buffers: shared read=30264
-> Sort (cost=471248.30..480320.48 rows=3628873 width=47) (actual
time=32002.128..32455.950 rows=3628803 loops=1)
Sort Key: public.observations.type, public.observations.ts
Sort Method: quicksort Memory: 381805kB
Buffers: shared read=30264
-> Result (cost=0.00..75862.81 rows=3628873 width=47) (actual
time=0.026..1323.317 rows=3628803 loops=1)
Buffers: shared read=30264
-> Append (cost=0.00..75862.81 rows=3628873 width=47)
(actual time=0.026..978.477 rows=3628803 loops=1)
Buffers: shared read=30264
...

the machine is not nailed down, but I think I'd need to find a way to
drastically improve the plan to keep this in Postgres. The alternative is
probably caching the results somewhere else: for any given subject, I only
need the latest observation of each type 99.9+% of the time.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bricklen 2013-06-21 16:08:38 Re: Query tuning: partitioning, DISTINCT ON, and indexing
Previous Message bricklen 2013-06-21 04:13:11 Re: Query tuning: partitioning, DISTINCT ON, and indexing