From: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> |
---|---|
To: | Reza Taheri <rtaheri(at)vmware(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: The need for clustered indexes to boost TPC-V performance |
Date: | 2012-07-04 05:43:43 |
Message-ID: | 4FF3D80F.9040408@ringerc.id.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
On 07/04/2012 07:13 AM, Reza Taheri wrote:
>
> Following the earlier email introducing the TPC-V benchmark, and that
> we are developing an industry standard benchmarking kit for TPC-V
> using PostgreSQL, here is a specific performance issue we have run into.
>
Which version of PostgreSQL are you using?
How has it been tuned beyond the defaults - autovacuum settings,
shared_buffers, effective_cache_size, WAL settings, etc?
How much RAM is on the blade? What OS and version are on the blade?
> Comparing the table sizes, we are close to 2X larger (more on this in
> a later note). But the index size is what stands out. Our overall
> index usage (again, after accounting for different numbers of rows) is
> 4.8X times larger. 35% of our I/Os are to the index space. I am
> guessing that the 4.8X ballooning has something to do with this, and
> that in itself explains a lot about our high I/O rate, as well as
> higher CPU/tran cycles compared to MS SQL (we are 2.5-3 times slower).
>
This is making me wonder about bloat issues and whether proper vacuuming
is being done. If the visibility map and free space map aren't
maintained by proper vaccum operation everything gets messy, fast.
> Well, MS SQL used a "clustered index" for CT, i.e., the data is held
> in the leaf pages of the index B-Tree. The data and index are in one
> data structure. Once you lookup the index, you also have the data at
> zero additional cost.
>
> [snip]
>
> Is the PGSQL community willing to invest in a feature that a) has been
> requested by many others already; and b) can make a huge difference in
> a benchmark that can lend substantial credibility to PGSQL performance?
>
while PostgreSQL doesn't support covering indexes or clustered indexes
at this point, 9.2 has added support for index-only scans, which are a
half-way point of sorts. See:
http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27
If at all possible please see how your test is affected by this
PostgreSQL 9.2 enhancement. It should make a big difference, and if it
doesn't it's important to know why.
(CC'd Robert Haas)
I'm not sure what the best option for getting a 9.2 beta build for
Windows is.
As for the "invest" side - that's really a matter for EnterpriseDB,
Command Prompt, Red Hat, and the other backers who're employing people
to work on the DB. Consider asking on pgsql-hackers, too; if nothing
else you'll get a good explanation of the current state and progress
toward clustered indexes.
Some links that may be useful to you are:
http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
Surprisingly, covering/clustered indexes aren't on there or at least
aren't easily found. It's certainly a much-desired feature despite its
apparent absence from the TODO.
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items
--
Craig Ringer
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2012-07-04 07:57:30 | Re: The need for clustered indexes to boost TPC-V performance |
Previous Message | Andy Chambers | 2012-07-04 00:38:10 | Cancel a pg_ctl stop |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2012-07-04 07:57:30 | Re: The need for clustered indexes to boost TPC-V performance |
Previous Message | Craig Ringer | 2012-07-04 05:19:14 | Re: Introducing the TPC-V benchmark, and its relationship to PostgreSQL |