Re: Microvacuum support for Hash Index

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Microvacuum support for Hash Index
Date: 2017-03-15 12:35:49
Message-ID: CAA4eK1JTGDnFti_M+FeW+nAt-9CerdBPRbsq9MMzgxxA8U3_Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 15, 2017 at 1:15 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 14, 2017 at 8:02 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>> Attached is the v6 patch for microvacuum in hash index rebased on top
>> of 'v10 patch for WAL in hash index - [1]' and 'v1 patch for WAL
>> consistency check for hash index - [2]'.
>>
>> [1] - https://www.postgresql.org/message-id/CAA4eK1%2Bk5wR4-kAjPqLoKemuHayQd6RkQQT9gheTfpn%2B72o1UA%40mail.gmail.com
>> [2] - https://www.postgresql.org/message-id/flat/CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg(at)mail(dot)gmail(dot)com#CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com
>>
>> Also, the patch (mask_hint_bit_LH_PAGE_HAS_DEAD_TUPLES.patch) to mask
>> 'LH_PAGE_HAS_DEAD_TUPLES' flag which got added as a part of
>> Microvacuum patch is attached with this mail.
>
> Generally, this patch looks like a pretty straightforward adaptation
> of the similar btree mechanism to hash indexes, so if it works for
> btree it ought to work here, too. But I noticed a few things while
> reading through it.
>
> + /* Get RelfileNode from relation OID */
> + rel = relation_open(htup->t_tableOid, NoLock);
> + rnode = rel->rd_node;
> + relation_close(rel, NoLock);
> itup->t_tid = htup->t_self;
> - _hash_doinsert(index, itup);
> + _hash_doinsert(index, itup, rnode);
>
> This is an awfully low-level place to be doing something like this.
> I'm not sure exactly where this should be happening, but not in the
> per-tuple callback.
>

I think one possibility is to get it using
indexrel->rd_index->indrelid in _hash_doinsert().

>
> But they're not called delete records in a hash index. The function
> is called hash_xlog_vacuum_one_page. The corresponding btree function
> is btree_xlog_delete. So this comment needs a little more updating.
>
> + if (IsBufferCleanupOK(bucket_buf))
> + {
> + _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> + (buf == bucket_buf) ? true : false,
> + hnode);
> + if (bucket_buf != buf)
> + LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
> +
> + if (PageGetFreeSpace(page) >= itemsz)
> + break; /* OK, now we have enough space */
> + }
>
> I might be missing something, but I don't quite see why this needs a
> cleanup lock on the primary bucket page. I would think a cleanup lock
> on the page we're actually modifying would suffice, and I think if
> that's correct it would be a better way to go.
>

Offhand, I also don't see any problem with it.

Few other comments:
1.
+ if (ndeletable > 0)
+ {
+ /* No ereport(ERROR) until changes are logged */
+ START_CRIT_SECTION();
+
+ PageIndexMultiDelete(page, deletable, ndeletable);
+
+ pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;

You clearing this flag while logging the action, but same is not taken
care during replay. Any reasons?

2.
+ /* No ereport(ERROR) until changes are logged */
+ START_CRIT_SECTION
();
+
+ PageIndexMultiDelete(page, deletable, ndeletable);
+
+ pageopaque =
(HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &=
~LH_PAGE_HAS_DEAD_TUPLES;
+
+ /*
+ * Write-lock the meta page so that we can
decrement
+ * tuple count.
+ */
+ LockBuffer(metabuf,
BUFFER_LOCK_EXCLUSIVE);

The lock on buffer should be acquired before critical section.

3.
-
/*
- * Since this can be redone later if needed, mark as a
hint.
- */
- MarkBufferDirtyHint(buf, true);
+
if (so->numKilled < MaxIndexTuplesPerPage)
+ {
+ so-
>killedItems[so->numKilled].heapTid = so->hashso_heappos;
+ so-
>killedItems[so->numKilled].indexOffset =
+
ItemPointerGetOffsetNumber(&(so->hashso_curpos));
+ so->numKilled++;
+
}

By looking at above code, the first thing that comes to mind is when
numKilled can become greater than MaxIndexTuplesPerPage and why we are
ignoring the marking of dead tuples when it becomes greater than
MaxIndexTuplesPerPage. After looking at similar btree code, I realize
that it could
happen if user reverses the scan direction. I think you should
mention in comments that see btgettuple to know the reason of
numKilled overun test or something like that.

4.
+ * We match items by heap TID before assuming they are the right ones to
+ * delete. If an item has
moved off the current page due to a split, we'll
+ * fail to find it and do nothing (this is not an
error case --- we assume
+ * the item will eventually get marked in a future indexscan).
+ */
+void
+_hash_kill_items(IndexScanDesc scan)

I think this comment doesn't make much sense for hash index because
item won't move off from the current page due to split, only later
cleanup can remove it.

5.

+
/*
* Maximum size of a hash index item (it's okay to have only one per page)

Spurious white space change.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-15 12:49:25 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Alexander Korotkov 2017-03-15 12:20:32 Re: GSOC - TOAST'ing in slices