Comparing toasted data (was improve Chinese locale performance)

From: Greg Stark <stark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Quan Zongliang <quanzongliang(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Comparing toasted data (was improve Chinese locale performance)
Date: 2013-07-28 12:39:09
Message-ID: CAM-w4HNRfb6vu6A9VYGoZcvxC7Z5hJBorO6k2Eg4=RK=t0+-OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 28, 2013 at 10:39 AM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> The main issue with strxfrm() is its lame API. If it supported
> returning prefixes you'd be set, but as it is you need >10MB of memory
> just to transform a 10MB string, even if only the first few characers
> would be enough to sort...

It occurs to me that the same issue impacts our handling of toast
data. If you compare a toasted bytea (or string in C locale) it would
be nice to fetch just the first chunk and start the comparison. Only
if you reach the end of that chunk should the next chunk be needed.
Even compressed data need not be decompressed past the point where the
comparison is decided. If the other datum is not toasted then you can
even know upfront what the worst case is of how much needs to be
detoasted.

It's too bad this wouldn't work for non-C locale strings. The tool to
do it would be strxfrm again but I can't imagine how to store toasted
strxfrm data in addition to the string that wouldn't cost more than it
gained.

--
greg

Browse pgsql-hackers by date

  From Date Subject
Next Message Gibheer 2013-07-28 17:21:03 Re: replication_reserved_connections
Previous Message Marko Tiikkaja 2013-07-28 10:23:23 Re: replication_reserved_connections