Re: The need for clustered indexes to boost TPC-V performance

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Reza Taheri <rtaheri(at)vmware(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: The need for clustered indexes to boost TPC-V performance
Date: 2012-07-04 13:39:38
Message-ID: CAAZKuFaC4CosZ4m0uVH37LATcrHRcgZLav-sN=TuM44zH1qXMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> On 07/04/2012 07:13 AM, Reza Taheri wrote:
>
> Following the earlier email introducing the TPC-V benchmark, and that we are
> developing an industry standard benchmarking kit for TPC-V using PostgreSQL,
> here is a specific performance issue we have run into.
>
>
> Which version of PostgreSQL are you using?
>
> How has it been tuned beyond the defaults - autovacuum settings,
> shared_buffers, effective_cache_size, WAL settings, etc?
>
> How much RAM is on the blade? What OS and version are on the blade?
>
>
> Comparing the table sizes, we are close to 2X larger (more on this in a
> later note). But the index size is what stands out. Our overall index usage
> (again, after accounting for different numbers of rows) is 4.8X times
> larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X
> ballooning has something to do with this, and that in itself explains a lot
> about our high I/O rate, as well as higher CPU/tran cycles compared to MS
> SQL (we are 2.5-3 times slower).
>
> This is making me wonder about bloat issues and whether proper vacuuming is
> being done. If the visibility map and free space map aren't maintained by
> proper vaccum operation everything gets messy, fast.
>
> Well, MS SQL used a “clustered index” for CT, i.e., the data is held in the
> leaf pages of the index B-Tree. The data and index are in one data
> structure. Once you lookup the index, you also have the data at zero
> additional cost.
>
> [snip]
>
>
>
> Is the PGSQL community willing to invest in a feature that a) has been
> requested by many others already; and b) can make a huge difference in a
> benchmark that can lend substantial credibility to PGSQL performance?
>
>
> while PostgreSQL doesn't support covering indexes or clustered indexes at
> this point, 9.2 has added support for index-only scans, which are a half-way
> point of sorts. See:
>
> http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
> http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27
>
> If at all possible please see how your test is affected by this PostgreSQL
> 9.2 enhancement. It should make a big difference, and if it doesn't it's
> important to know why.
>
> (CC'd Robert Haas)
>
> I'm not sure what the best option for getting a 9.2 beta build for Windows
> is.
>
>
> As for the "invest" side - that's really a matter for EnterpriseDB, Command
> Prompt, Red Hat, and the other backers who're employing people to work on
> the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a
> good explanation of the current state and progress toward clustered indexes.
>
> Some links that may be useful to you are:
>
> http://wiki.postgresql.org/wiki/Todo
> Things that it'd be good to support/implement at some point. Surprisingly,
> covering/clustered indexes aren't on there or at least aren't easily found.
> It's certainly a much-desired feature despite its apparent absence from the
> TODO.

I think there is, deservingly, a lot of hesitation to implement a
strictly ordered table construct. A similar feature that didn't quite
get finished -- but maybe can be beaten into shape -- is the
grouped-index-tuple implementation:

http://community.enterprisedb.com/git/

It is mentioned on the TODO page. It's under the category that is
perhaps poorly syntactically overloaded in the world "cluster".

--
fdr

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bartosz Dmytrak 2012-07-04 14:20:01 Re: View parsing
Previous Message Venkat Balaji 2012-07-04 10:29:39 Re: : Postgresql Error after recovery

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2012-07-04 14:26:36 Re: The need for clustered indexes to boost TPC-V performance
Previous Message Hermann Matthes 2012-07-04 12:25:28 Paged Query