From: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
---|---|
To: | amborodin(at)acm(dot)org |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Sergey Mirvoda <sergey(at)mirvoda(dot)com> |
Subject: | Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] |
Date: | 2016-08-16 15:23:57 |
Message-ID: | ac2d0b41-84e2-153c-d41a-3456fdeb5785@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
09.08.2016 19:45, Andrew Borodin:
> Here is new version of the patch, now it includes recommendations from
> Anastasia Lubennikova.
>
>
> I've investigated anamalous index size decrease. Most probable version
> appeared to be true.
> Cube extension, as some others, use Guttman's polynomial time split
> algorithm. It is known to generate "needle-like" MBBs (MBRs) for
> sorted data due to imbalanced splits (splitting 100 tuples as 98 to
> 2). Due to previous throw-to-the-end behavior of GiST this imbalance
> was further amplificated (most of inserts were going to bigger part
> after split). So GiST inserts were extremely slow for sorted data.
> There is no need to do exact sorting to trigger this behavior.
> This behavior can be fused by implementation of small-m restriction in
> split (AFAIR this is described in original R-tree paper from 84),
> which prescribes to do a split in a parts no smaller than m, where m
> is around 20% of a page capacity (in tuples number). After application
> of this patch performance for sorted data is roughly the same as
> performance for randomized data.
Thank you for explanation. Now it's clear to me. I did some more testing and
found nothing special. The declared feature is implemented correctly.
It passes all regression tests and improves performance.
I still have a few minor nitpicks about the patch style.
You can find a style guide on
https://www.postgresql.org/docs/9.6/static/source.html
1) remove extra whitespace in README
2) add whitespace: if(ntup == 1)
3) fix comments in accordance with coding conventions
/* In case of single tuple update GiST calls overwrite
* In all other cases function gistplacetopage deletes
* old tuples and place updated at the end
* */
+ /* Normally here was following assertion
+ * Assert(ItemIdHasStorage(ii));
+ * This assertion was introduced in PageIndexTupleDelete
+ * But if this function will be used from BRIN index
+ * this assertion will fail. Thus, here we do not check that
+ * item has the storage.
+ * */
4) remove unrelated changes
- data += sizeof(OffsetNumber) * xldata->ntodelete;
+ data += sizeof(OffsetNumber) *xldata->ntodelete;
5) If the comment is correct, maxalignment is not necessary.
+ /* tuples on a page are always maxaligned */
+ oldsize = MAXALIGN(oldsize);
6) I'd rather use alignednewsize here.
+ ItemIdSetNormal(tupid, offset + size_diff, newsize);
After the cleanup you can change status to "Ready for Committer"
without waiting for the response.
> If data is randomized this effect of Guttman poly-time split is not in
> effect; test from the start of the thread will show no statistically
> confident improvement by the patch.
> To prove performance increase in randomized case I've composed
> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
> This test with five runs shows following:
> Before patch
> Insert Time AVG 78 seconds STD 9.5
> Afer patch
> Insert Time AVG 68 seconds STD 7.7
> This is marginal but statistically viable performance improvement.
>
> For sorted data performance is improved by a factor of 3.
> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-08-16 15:24:21 | Re: PSA: Systemd will kill PostgreSQL |
Previous Message | Yury Zhuravlev | 2016-08-16 15:22:17 | Re: [GENERAL] C++ port of Postgres |