From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Making strxfrm() blobs in indexes work |
Date: | 2014-01-31 02:33:25 |
Message-ID: | CAM3SWZSzd9yRLpi+3kQxcp50hDyGAktx7-KPFopO_cKRGmr3pA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 30, 2014 at 3:49 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> So ISTM that we could come up with an infrastructure, possibly just
> for insertion scanKeys (limiting the code footprint of all of this) in
> order to inner-page-process datums at this juncture, and store a blob
> instead, for later savings on walking the tree. I believe that this
> technique has applicability beyond strxfrm(), and a type-naive
> infrastructure could be developed to support it, with a degree of
> complexity not too far in excess of, say, the SortSupport
> infrastructure. The realization that we don't really need to recover
> the original information directly from the inner pages gives us scope
> to push things in several useful directions, I suspect.
Techniques around B-Trees that are useful for common workloads may
have problems around something that rhymes with "combatants", since it
stands to reason that many were developed within industry, and some of
the major vendors are known to be very combative. This whole area
seems safe, though. After all, Singleton wrote in a 1969 paper
"ALGORITHM 347: AN EFFICIENT ALGORITHM FOR SORTING WITH MINIMAL
STORAGE" that ACM members can download:
"This FORTRAN subroutine was tested on a CDC 6400 computer. For random
uniform numbers, sorting times divided by n log^2 n were nearly
constant at 20.2 X 10^-6 for 100 < n < 10,000, with a time of 0.202
seconds for 1000 items. This subroutine was also hand-compiled for the
same computer to produce a more efficient machine code. In this
version the constant of proportionality was 5.2 X 10^-6, with a time
of 0.052 seconds for 1000 items. In both cases, integer comparisons
were used to order normalized floating-point numbers."
This looks very much like prior art for our purposes. I'm pretty sure
that the problem was back then they didn't have dedicated
floating-point units, so they had to do what a contemporary embedded
systems engineer would call floating point emulation, at great
expense. Better to avoid a more expensive comparisons when you can.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2014-01-31 02:38:47 | Re: Prohibit row-security + inheritance in 9.4? |
Previous Message | Andreas Karlsson | 2014-01-31 01:58:09 | Re: GiST support for inet datatypes |