Re: slow query - will CLUSTER help?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Sev Zaslavsky <sevzas(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: slow query - will CLUSTER help?
Date: 2013-12-19 20:33:13
Message-ID: CAMkU=1wWnXaeodK5dBiFT=sG=Ff7Zro_=_05+FuNn-yWLc+BCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Dec 12, 2013 at 9:30 AM, Sev Zaslavsky <sevzas(at)gmail(dot)com> wrote:

> Hello,
>
> I've got a very simple table with a very simple SELECT query, but it takes
> longer on the initial run than I'd like, so I want to see if there is a
> strategy to optimize this.
>
> Table rt_h_nbbo contains several hundred million rows. All rows for a
> given entry_date are appended to this table in an overnight process every
> night - on the order of several million rows per day.
>
> The objective is to select all of the rows for a given product_id on a
> given entry_date.
>
> There is a b-tree index on (product_id, entry_date). The index appears to
> be used correctly. I'm seeing that if the data pages are not in memory,
> nearly all of the time is spent on disk I/O. The first time, the query
> takes 21 sec. If I run this query a second time, it completes in approx
> 1-2 ms.
>
> I perceive an inefficiency here and I'd like your input as to how to deal
> with it: The end result of the query is 1631 rows which is on the order of
> about a couple hundred Kb of data. Compare that to the amount of I/O that
> was done: 1634 buffers were loaded, 16Mb per page - that's about 24 Gb of
> data!
>

A page is usually 8KB, not 16MB (nor 16Mb).

> Query completed in 21 sec. I'd like to be able to physically
> re-organize the data on disk so that the data for a given product_id on a
> entry_date is concentrated on a few pages instead of being scattered like I
> see here.
>

If you load the data in daily batches, it is probably already fairly well
clustered by entry_date. If you sort the batch by product_id before bulk
loading it, then it should stay pretty well clustered on (entry_date,
product_id).

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Sergey Konoplev 2013-12-19 20:34:11 Re: slow query - will CLUSTER help?
Previous Message Kevin Grittner 2013-12-19 20:14:16 Re: slow query - will CLUSTER help?