Re: GIN improvements part 1: additional information

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2013-12-20 20:12:38
Message-ID: CAPpHfdvGG4K8A+0RxBNPVfGFQ=vk2tDpRVqP--uprvCKmP=t2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 20, 2013 at 11:36 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:
>
>> On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:
>>
>>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>>>
>>>> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
>>>> <hlinnakangas(at)vmware(dot)com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> On 12/12/2013 06:44 PM, Alexander Korotkov wrote:
>>>>>
>>>>> When values are packed into small groups, we have to either insert
>>>>>
>>>>>> inefficiently encoded value or re-encode whole right part of values.
>>>>>>
>>>>>
>>>>> It would probably be simplest to store newly inserted items
>>>>> uncompressed,
>>>>> in a separate area in the page. For example, grow the list of
>>>>> uncompressed
>>>>> items downwards from pg_upper, and the compressed items upwards from
>>>>> pg_lower. When the page fills up, re-encode the whole page.
>>>>>
>>>>
>>> I hacked together an implementation of a variant of Simple9, to see how
>>> it performs. Insertions are handled per the above scheme.
>>>
>>
>> Here's an updated version of that, using the page layout without
>> item-indexes that I described in the other post. This is much less buggy
>> than that earlier crude version I posted - and unfortunately it doesn't
>> compress as well. The earlier version lost some items :-(.
>>
>> Nevertheless, I think this page layout and code formatting is better,
>> even if we switch the encoding back to the varbyte encoding in the end.
>>
>
> Yet another version. The encoding/decoding code is now quite isolated in
> ginpostinglist.c, so it's easy to experiment with different encodings. This
> patch uses varbyte encoding again.
>
> I got a bit carried away, experimented with a bunch of different
> encodings. I tried rice encoding, rice encoding with block and offset
> number delta stored separately, the simple9 variant, and varbyte encoding.
>
> The compressed size obviously depends a lot on the distribution of the
> items, but in the test set I used, the differences between different
> encodings were quite small.
>
> One fatal problem with many encodings is VACUUM. If a page is completely
> full and you remove one item, the result must still fit. In other words,
> removing an item must never enlarge the space needed. Otherwise we have to
> be able to split on vacuum, which adds a lot of code, and also makes it
> possible for VACUUM to fail if there is no disk space left. That's
> unpleasant if you're trying to run VACUUM to release disk space. (gin fast
> updates already has that problem BTW, but let's not make it worse)
>
> I believe that eliminates all encodings in the Simple family, as well as
> PForDelta, and surprisingly also Rice encoding. For example, if you have
> three items in consecutive offsets, the differences between them are
> encoded as 11 in rice encoding. If you remove the middle item, the encoding
> for the next item becomes 010, which takes more space than the original.
>
> AFAICS varbyte encoding is safe from that. (a formal proof would be nice
> though).
>

Formal proof is so. Removing number is actually replacement of two numbers
with their sum. We have to proof that varbyte encoding of sum can't be
longer than varbyte encoding of summands.High bit number of sum is at most
one more than high bit number of larger summand. So varbyte encoding of sum
is at most one byte longer than varbyte encoding of larger summand. Lesser
summand is at least one byte.

> So, I'm happy to go with varbyte encoding now, indeed I don't think we
> have much choice unless someone can come up with an alternative that's
> VACUUM-safe. I have to put this patch aside for a while now, I spent a lot
> more time on these encoding experiments than I intended. If you could take
> a look at this latest version, spend some time reviewing it and cleaning up
> any obsolete comments, and re-run the performance tests you did earlier,
> that would be great. One thing I'm slightly worried about is the overhead
> of merging the compressed and uncompressed posting lists in a scan. This
> patch will be in good shape for the final commitfest, or even before that.

OK. I'll make tests for scans on mix of compressed and uncompressed posting
lists. However, I don't expect it to be slower than both pure compressed
and pure uncompressed posting lists. Overhead of compressing uncompressed
posting lists is evident. But it also would be nice to measure.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-20 20:39:12 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Andres Freund 2013-12-20 20:09:31 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages