From: | Peter Geoghegan <peter(at)2ndquadrant(dot)com> |
---|---|
To: | Florian Pflug <fgp(at)phlo(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: sortsupport for text |
Date: | 2012-06-21 11:15:31 |
Message-ID: | CAEYLb_UU=h=oFs+b96uNPJTbiQvsnHJPs5kV7pbPCxxKoTNPzQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21 June 2012 11:40, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> At this point, my theory is that your choice of "random" strings prevents
> strxfrm() from ever winning over strcoll(). The reason being that you pick
> each letter uniformly distributed from a-z, resulting in a probability of
> two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
> extremely complex collation rules, strcoll() probably only needs to compare
> a few characters to determine the order. Whereas strxfrm() has to compute
> the whole sort key, no matter what.
Good point.
> The question is thus, how good a model are your "random" strings for the
> input of a typical sorting step in postgres? My guess is, a quite good one
> actually, since people probably don't deal with a lot of very similar strings
> very often. Which makes we wonder if using strxfrm() during sorting wouldn't
> be a net loss, all things considered…
Well, I think the answer to that has to be no. I posted a simple
postgres text -> strxfrm blob bytea SQL-callable wrapper function a
few days ago that clearly wins by quite a bit on some sample queries
against the dellstore sample database, which has a representative
sample set. Completely uniformly-distributed data actually isn't
representative of the real world at all. Normal distributions abound.
This C++ program was never intended to justify the general utility of
using strxfrm() for sorting, which I don't believe is in question. I
just wanted to get some idea about the performance characteristics of
using strxfrm() to traverse an index.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2012-06-21 11:20:56 | Re: not null validation option in contrib/file_fdw |
Previous Message | Florian Pflug | 2012-06-21 10:40:51 | Re: sortsupport for text |