From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Making strxfrm() blobs in indexes work |
Date: | 2014-01-31 00:34:37 |
Message-ID: | 27262.1391128477@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On more occasions than I care to recall, someone has suggested that it
> would be valuable to do something with strxfrm() blobs in order to
> have cheaper locale-aware text comparisons. One obvious place to do so
> would be in indexes, but in the past that has been dismissed on the
> following grounds:
> * Index-only scans need fully formed datums to work, and strxfrm() is
> a one-way function (or so I believe). There is no reason to think that
> the original string can be reassembled from the blob, so clearly that
> won't fly.
> * There is a high cost to be paid in storage overhead. Even for
> collations like "en_US.UTF-8", that can mean that the blob is as much
> as 3-4 times larger than the original text string. Who is to say that
> we'll come out ahead even with the savings of just doing a strcmp()
> rather than a strcoll()?
Quite aside from the index bloat risk, this effect means a 3-4x reduction
in the maximum string length that can be indexed before getting the
dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error.
Worse, a value insertion might well succeed, with the failure happening
only (much?) later when that entry is chosen as a page split boundary.
It's possible that TOAST compression of the strings would save you, but
I'm not convinced of that; it certainly doesn't seem like we could
guarantee no post-insertion failures that way.
Also, detoasting of strings that hadn't been long enough to need toasting
before could easily eat any savings.
> I'm sure anyone that has read this far knows where I'm going with
> this: why can't we just have strxfrm() blobs in the inner pages,
> implying large savings for a big majority of text comparisons that
> service index scans, without bloating the indexes too badly, and
> without breaking anything? We only use inner pages to find leaf pages.
> They're redundant copies of the data within the index.
It's a cute idea though, and perhaps worth pursuing as long as you've
got the pitfalls in mind.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2014-01-31 00:35:05 | Re: Prohibit row-security + inheritance in 9.4? |
Previous Message | Merlin Moncure | 2014-01-31 00:21:12 | Re: jsonb and nested hstore |