From: | Hannu Krosing <hannu(at)skype(dot)net> |
---|---|
To: | Luke Lonergan <llonergan(at)greenplum(dot)com> |
Cc: | Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Including Snapshot Info with Indexes |
Date: | 2007-10-23 07:20:48 |
Message-ID: | 1193124048.28269.19.camel@hannu-laptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Ühel kenal päeval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan:
> Hi Hannu,
>
> On 10/14/07 12:58 AM, "Hannu Krosing" <hannu(at)skype(dot)net> wrote:
>
> > What has happened in reality, is that the speed difference between CPU,
> > RAM and disk speeds has _increased_ tremendously
>
> Yes.
>
> > which makes it even
> > more important to _decrease_ the size of stored data if you want good
> > performance
>
> Or bring the cpu processing closer to the data it's using (or both).
>
> By default, the trend you mention first will continue in an unending way -
> the consequence is that the "distance" between a processor and it's target
> data will continue to increase ad-infinitum.
the emergence of solid-state (flash) disks may help a little here, but
in general it is true.
> By contrast, you can only decrease the data volume so much - so in the end
> you'll be left with the same problem - the data needs to be closer to the
> processing. This is the essence of parallel / shared nothing architecture.
>
> Note that we've done this at Greenplum. We're also implementing a DSM-like
> capability and are investigating a couple of different hybrid row / column
> store approaches.
Have you tried moving the whole visibility part of tuples out to a
separate heap ?
Especially in OLAP/ETL scenarios the distribution of tuples loaded in
one transaction should be very good for visibility-info compression.
I'd suspect that you could crush hundreds of pages worth of visibility
into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X,
end_ctid=Y), and it will stay in L1 cache most of the time you process
the corresponding relation. and the relation itself will be smaller, and
index-only (actually index-only + lookup inside L1 cache) access can
happen, and so on .
OTOH, if you load it in millions of small transactions, you can run
VACUUM FREEZE _on_ the visibility heap only, which will make all
visibility infoe look similar and thus RLE-compressable and again make
it fit in L1 cache, if you dont have lots of failed loads interleaved
with successful ones.
> Bitmap index with index-only access does provide nearly all of the
> advantages of a column store from a speed standpoint BTW. Even though
> Vertica is touting speed advantages - our parallel engine plus bitmap index
> will crush them in benchmarks when they show up with real code.
>
> Meanwhile they're moving on to new ideas - I kid you not "Horizontica" is
> Dr. Stonebraker's new idea :-)
Sounds like a result of a marketroid brainstorming session :P
> So - bottom line - some ideas from column store make sense, but it's not a
> cure-all.
>
> > There is also a MonetDB/X100 project, which tries to make MonetOD
> > order(s) of magnitude faster by doing in-page compression in order to
> > get even more performance, see:
>
> Actually, the majority of the points made by the MonetDB team involve
> decreasing the abstractions in the processing path to improve the IPC
> (instructions per clock) efficiency of the executor.
The X100 part was about doing in-page compression, so the efficiency of
disk to L1 cache pathway would increase. so for 1/2 compression the CPU
would get twice the data threoughput.
> We are also planning to do this by operating on data in vectors of projected
> rows in the executor, which will increase the IPC by reducing I-cache misses
> and improving D-cache locality. Tight loops will make a much bigger
> difference when long runs of data are the target operands.
>
> - Luke
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2007-10-23 07:49:31 | Re: Feature Freeze date for 8.4 |
Previous Message | Pavel Stehule | 2007-10-23 07:03:53 | Re: Feature Freeze date for 8.4 |
From | Date | Subject | |
---|---|---|---|
Next Message | Gokulakannan Somasundaram | 2007-10-23 09:10:37 | Re: [HACKERS] Including Snapshot Info with Indexes |
Previous Message | Tom Lane | 2007-10-22 17:32:06 | Re: pgstattuple locking fix |