From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part 1: additional information |
Date: | 2014-01-08 21:58:39 |
Message-ID: | CAPpHfduBDxQ7Tw5bZCCgxmJzNHDRG6eELAuseuCEvKt=dw0Yfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 6, 2014 at 12:35 PM, Amit Langote <amitlangote09(at)gmail(dot)com>wrote:
> On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
> >
> > Yet another version. The encoding/decoding code is now quite isolated in
> > ginpostinglist.c, so it's easy to experiment with different encodings.
> This
> > patch uses varbyte encoding again.
> >
> > I got a bit carried away, experimented with a bunch of different
> encodings.
> > I tried rice encoding, rice encoding with block and offset number delta
> > stored separately, the simple9 variant, and varbyte encoding.
> >
> > The compressed size obviously depends a lot on the distribution of the
> > items, but in the test set I used, the differences between different
> > encodings were quite small.
> >
> > One fatal problem with many encodings is VACUUM. If a page is completely
> > full and you remove one item, the result must still fit. In other words,
> > removing an item must never enlarge the space needed. Otherwise we have
> to
> > be able to split on vacuum, which adds a lot of code, and also makes it
> > possible for VACUUM to fail if there is no disk space left. That's
> > unpleasant if you're trying to run VACUUM to release disk space. (gin
> fast
> > updates already has that problem BTW, but let's not make it worse)
> >
> > I believe that eliminates all encodings in the Simple family, as well as
> > PForDelta, and surprisingly also Rice encoding. For example, if you have
> > three items in consecutive offsets, the differences between them are
> encoded
> > as 11 in rice encoding. If you remove the middle item, the encoding for
> the
> > next item becomes 010, which takes more space than the original.
> >
> > AFAICS varbyte encoding is safe from that. (a formal proof would be nice
> > though).
> >
> > So, I'm happy to go with varbyte encoding now, indeed I don't think we
> have
> > much choice unless someone can come up with an alternative that's
> > VACUUM-safe. I have to put this patch aside for a while now, I spent a
> lot
> > more time on these encoding experiments than I intended. If you could
> take a
> > look at this latest version, spend some time reviewing it and cleaning up
> > any obsolete comments, and re-run the performance tests you did earlier,
> > that would be great. One thing I'm slightly worried about is the
> overhead of
> > merging the compressed and uncompressed posting lists in a scan. This
> patch
> > will be in good shape for the final commitfest, or even before that.
> >
>
>
> I just tried out the patch "gin-packed-postinglists-varbyte2.patch"
> (which looks like the latest one in this thread) as follows:
>
> 1) Applied patch to the HEAD (on commit
> 94b899b829657332bda856ac3f06153d09077bd1)
> 2) Created a test table and index
>
> create table test (a text);
> copy test from '/usr/share/dict/words';
> create index test_trgm_idx on test using gin (a gin_trgm_ops);
>
> 3) Got the following error on a wildcard query:
>
> postgres=# explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR: lock 9447 is not held
> STATEMENT: explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR: lock 9447 is not held
>
Thanks for reporting. Fixed version is attached.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
gin-packed-postinglists-varbyte3.patch.gz | application/x-gzip | 29.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2014-01-08 22:04:10 | commit fest manager? |
Previous Message | Kevin Grittner | 2014-01-08 21:57:20 | Re: Standalone synchronous master |