Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-07-14 17:53:24
Message-ID: CAM3SWZTUdndFhBq95+9H+Oon6fjsRH0B0C4T6g423cwtPE0hNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 12, 2014 at 2:09 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Right. It was a little bit incautious of me to say that we had the
> full benefit of 8 bytes of storage with "en_US.UTF-8", since that is
> only true of lower case characters (I think that FreeBSD can play
> tricks with this. Sometimes, it will give you the benefit of 8 bytes
> of entropy for an 8 byte string, with only non-differentiating
> trailing bytes, so that the first 8 bytes of "Aaaaaaaa" are distinct
> from the first eight bytes of "aaaaaaaa", while any trailing bytes are
> non-distinct for both). In any case it's pretty clear that a goal of
> the glibc implementation is to concentrate as much entropy as possible
> into the first part of the string, and that's the important point.

I thought about using an order-preserving compression technique like
Hu Tucker [1] in order to get additional benefit from those 8 bytes.
Even my apparently sympathetic cities example isn't all that
sympathetic, since the number of distinct normalized keys is only
about 75% of the total number of cities (while a strcmp()-only
reliable tie-breaker isn't expected to be useful for the remaining
25%). Here is the improvement I see when I setup things so that there
is a 1:1 correspondence between city rows and distinct normalized
keys:

postgres=# with cc as
(select count(*), array_agg(ctid) ct,
substring(strxfrm_test(city)::text from 0 for 19 ) blob from cities
group by 3 having count(*) > 1 order by 1),
ff as
(
select unnest(ct[2:400]) u, blob from cc
)
delete from cities using ff where cities.ctid = ff.u;

postgres=# vacuum full cities ;
VACUUM

Afterwards:

postgres=# select count(*) from (select distinct
substring(strxfrm_test(city)::text from 0 for 19) from cities) i;
count
--------
243782
(1 row)

postgres=# select count(*) from cities;
count
--------
243782
(1 row)

$ cat sort-city.sql
select * from (select * from cities order by city offset 1000000) d;

Patch results
==========

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 1734
latency average: 173.010 ms
tps = 5.778545 (including connections establishing)
tps = 5.778572 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 2916
latency average: 411.523 ms
tps = 9.706683 (including connections establishing)
tps = 9.706776 (excluding connections establishing)

Master results
==========

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 390
latency average: 769.231 ms
tps = 1.297545 (including connections establishing)
tps = 1.297551 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 535
latency average: 2242.991 ms
tps = 1.777005 (including connections establishing)
tps = 1.777024 (excluding connections establishing)

So that seems like a considerable further improvement that would be
nice to see more frequently. I don't know that it's worth it to go to
the trouble of compressing given the existing ameliorating measures
(HyperLogLog, etc), at least if using compression is motivated by
worst case performance. There is some research on making this work for
this kind of thing specifically [2].

My concern is that it won't be worth it to do the extra work,
particularly given that I already have 8 bytes to work with. Supposing
I only had 4 bytes to work with (as researchers writing [2] may have
only had in 1994), that would leave me with a relatively small number
of distinct normalized keys in many representative cases. For example,
I'd have a mere 40,665 distinct normalized keys in the case of my
"cities" database, rather than 243,782 (out of a set of 317,102 rows)
for 8 bytes of storage. But if I double that to 16 bytes (which might
be taken as a proxy for what a good compression scheme could get me),
I only get a modest improvement - 273,795 distinct keys. To be fair,
that's in no small part because there are only 275,330 distinct city
names overall (and so most dups get away with a cheap memcmp() on
their tie-breaker), but this is a reasonably organic, representative
dataset.

Now, it's really hard to judge something like that, and I don't
imagine that this analysis is all that scientific. I am inclined to
think that we're better off just aborting if the optimization doesn't
work out while copying, and forgetting about order preserving
compression. Let us not lose sight of the fact that strxfrm() calls
are not that expensive relative to the cost of the sort in almost all
cases.

[1] http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/jstor_514.pdf

[2] http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-94-3.pdf
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2014-07-14 17:54:22 Re: SSL information view
Previous Message Fabrízio de Royes Mello 2014-07-14 17:23:56 Re: Pg_upgrade and toast tables bug discovered