From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Reza Taheri <rtaheri(at)vmware(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Andy Bond (abond(at)redhat(dot)com)" <abond(at)redhat(dot)com>, Greg Kopczynski <gregwk(at)vmware(dot)com>, Jignesh Shah <jshah(at)vmware(dot)com> |
Subject: | Re: The need for clustered indexes to boost TPC-V performance |
Date: | 2012-07-04 14:26:36 |
Message-ID: | CAGTBQpY4J2on5=gN5-ykSpZkxVeGHAfiZwu0mXpqsSXwoxxnPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
On Tue, Jul 3, 2012 at 8:13 PM, Reza Taheri <rtaheri(at)vmware(dot)com> wrote:
> So I looked more closely at the indexes. I chose the CASH_TRANSACTION
> table since it has a single index, and we can compare it more directly to the
> Dell data. If you look at page 34 of http://bit.ly/QeWXhE, the index size of CT
> is 1,278,720KB for 6,120,529,488 rows. That’s less than one byte of index
> per data row! How could that be? Well, MS SQL used a “clustered index”
> for CT, i.e., the data is held in the leaf pages of the index B-Tree.
> The data and index are in one data structure. Once you lookup the index,
> you also have the data at zero additional cost. For PGSQL, we had to create
> a regular index, which took up 55GB. Once you do the math, this works out
> to around 30 bytes per row. I imagine we have the 15-byte key along with a
> couple of 4-byte or 8-byte pointers.
...
> So MS SQL beats PGSQL by a) having a lower I/O rate due to no competition
> for the buffer pool from indexes (except for secondary indexes); and b) by
> getting the data with a free lookup, whereas we have to work our way down
> both the index and the data trees.
15-byte key?
What about not storing the keys, but a hash, for leaf nodes?
Assuming it can be made to work for both "range" and "equality" scans,
holding only hashes on leaf nodes would reduce index size, but how
much?
I think it's doable, and I could come up with a spec if it's worth it.
It would have to scan the heap for only two extra index pages (the
extremes that cannot be ruled out) and hash collisions, which doesn't
seem like a big loss versus the reduced index.
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Kupershmidt | 2012-07-04 18:33:04 | Re: TG_COLUMNS_UPDATED |
Previous Message | Bartosz Dmytrak | 2012-07-04 14:20:01 | Re: View parsing |
From | Date | Subject | |
---|---|---|---|
Next Message | Reza Taheri | 2012-07-04 18:24:08 | Re: Introducing the TPC-V benchmark, and its relationship to PostgreSQL |
Previous Message | Daniel Farina | 2012-07-04 13:39:38 | Re: The need for clustered indexes to boost TPC-V performance |