Re: Faster inserts with mostly-monotonically increasing values

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-22 23:50:13
Message-ID: CAGTBQpaJs-jjtb5wsBKfzwVuWajOdxP5T7XTtpz1hrqic4x=Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 22, 2018 at 3:29 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
> On Thu, Mar 22, 2018 at 7:22 AM, Claudio Freire <klaussfreire(at)gmail(dot)com>
> wrote:
>>
>>
>> If you're not planning on making any further changes, would you mind
>> posting a coalesced patch?
>>
>> Notice that I removed the last offset thing only halfway, so it would
>> need some cleanup.
>
>
> Here is an updated patch. I removed the last offset caching thing completely
> and integrated your changes for conditional lock access. Some other
> improvements to test cases too. I realised that we must truncate and
> re-insert to test index fastpath correctly.

Thanks.

Some comments

+ /*
+ * It's very common to have an index on an auto-incremented or
+ * monotonically increasing value. In such cases, every insertion happens
+ * towards the end of the index. We try to optimise that case by caching
+ * the right-most block of the index. If our cached block is still the
+ * rightmost block, has enough free space to accommodate a new entry and
+ * the insertion key is greater or equal to the first key in this page,
+ * then we can safely conclude that the new key will be inserted in the
+ * cached block. So we simply search within the cached page and insert the
+ * key at the appropriate location. We call it a fastpath.

It should say "the insertion key is strictly greater than the first key"

Equal cases cannot be accepted since it would break uniqueness checks,
so the comment should say that. The code already is correctly checking
for strict inequality, it's just the comment that is out of sync.

You might accept equal keys for non-unique indexes, but I don't
believe making that distinction in the code is worth the hassle.

Also, "rightmost block" != "rightmost leaf page" ("leaf" being the key
difference). So it should say "rightmost leaf page".

+ if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
+ !P_INCOMPLETE_SPLIT(lpageop) &&
+ !P_IGNORE(lpageop) &&
+ (PageGetFreeSpace(page) > itemsz) &&
+ PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) &&
+ _bt_compare(rel, natts, itup_scankey, page,
+ P_FIRSTDATAKEY(lpageop)) > 0)
+ {
+ offset = InvalidOffsetNumber;
+ fastpath = true;
+ }
...later...
+ if (!fastpath)
+ {
+ /* find the first page containing this key */
+ stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE,
+ NULL);
+
+ offset = InvalidOffsetNumber;

Setting "offset = InvalidOffsetNumber" in that contorted way is
unnecessary. You can remove the first assignment and instead
initialize unconditionally right after the fastpath block (that
doesn't make use of offset anyway):

In indexing.out:

+explain select sum(a) from fastpath where a = 6456;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Aggregate (cost=4.31..4.32 rows=1 width=8)
+ -> Index Only Scan using fpindex1 on fastpath (cost=0.29..4.30
rows=1 width=4)
+ Index Cond: (a = 6456)
+(3 rows)

Having costs in explain tests can be fragile. Better use "explain
(costs off)". If you run "make check" continuously in a loop, you
should get some failures related to that pretty quickly.

+truncate fastpath;
+insert into fastpath select y.x, 'b' || (y.x/10)::text, 100 from
(select generate_series(1,10000) as x) y;
+vacuum fastpath;

Vacuum can also be a pain. In parallel tests, vacuum won't do what you
expect it to do, as other concurrent transactions will prevent it
from, say, marking all-visible pages.

In fact, you can get those spurious failures by running "make -j8
check-world" repeatedly (with tap tests enabled). That causes enough
concurrency to cause trouble and sporadic failures.

Question whether you need it. I'm not sure why you need the explain
tests, which I imagine are the reason why you need that vacuum there.

If you do need it, I successfully used the following helper in another patch:

CREATE FUNCTION wait_barrier() RETURNS void LANGUAGE plpgsql AS $$
DECLARE vxids text[];
BEGIN
-- wait for all transactions to commit to make deleted tuples vacuumable
SELECT array_agg(DISTINCT virtualtransaction) FROM pg_locks WHERE pid
<> pg_backend_pid() INTO vxids;
WHILE (SELECT count(virtualtransaction) FROM pg_locks WHERE pid <>
pg_backend_pid() AND virtualtransaction = ANY(vxids)) > 0 LOOP
PERFORM pg_sleep(0.1);
END LOOP;
END
$$;

You call it right before vacuum to make sure all potentially
troublesome transactions finished by the time vacuum runs:

SELECT wait_barrier();
VACUUM blah;

Name the function something else, though, to avoid name clashing with
that patch. Also... that function up there is also still up for review
itself, so take it with a grain of salt. It worked for me.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-03-23 00:11:42 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Peter Geoghegan 2018-03-22 23:45:01 Re: [HACKERS] MERGE SQL Statement for PG11