Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, lubennikovaav(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pavan(dot)deolasee(at)gmail(dot)com, ibrar(dot)ahmad(at)gmail(dot)com
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Date: 2020-11-02 14:44:53
Message-ID: 20201102144453.tumlk7qhw3gzn2dk@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 02, 2020 at 04:44:22PM +0300, Anastasia Lubennikova wrote:
>On 30.10.2020 03:42, Tomas Vondra wrote:
>>Hi,
>>
>>I might be somewhat late to the party, but I've done a bit of
>>benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
>>different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
>>current master and with the patch.
>>
>>So I don't really observe any measurable slowdowns in the COPY part (in
>>fact there seems to be a tiny speedup, but it might be just noise). In
>>the VACUUM part, there's clear speedup when the data was already frozen
>>by COPY (Yes, those are zeroes, because it took less than 1 second.)
>>
>>So that looks pretty awesome, I guess.
>>
>>For the record, these tests were run on a server with NVMe SSD, so
>>hopefully reliable and representative numbers.
>>
>Thank you for the review.
>
>>A couple minor comments about the code:
>>
>>2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
>>of personal preference, but I'd just use a single level of nesting, i.e.
>>something like this:
>>
>>    /* if everything frozen, the whole page has to be visible */
>>    Assert(!(all_frozen_set && !PageIsAllVisible(page)));
>>
>>    /*
>>     * If we've frozen everything on the page, and if we're already
>>     * holding pin on the vmbuffer, record that in the visibilitymap.
>>     * If we're not holding the pin, it's OK to skip this - it's just
>>     * an optimization.
>>     *
>>     * It's fine to use InvalidTransactionId here - this is only used
>>     * when HEAP_INSERT_FROZEN is specified, which intentionally
>>     * violates visibility rules.
>>     */
>>    if (all_frozen_set &&
>>        visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
>>    visibilitymap_set(...);
>>
>>IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
>>to say what we're doing etc. And I've moved the comment from inside the
>>if block into the main comment - that was making it harder to read too.
>>
>I agree that it's a matter of taste. I've updated comments and left
>nesting unchanged to keep assertions simple.
>
>>
>>3) I see RelationGetBufferForTuple does this:
>>
>>    /*
>>     * This is for COPY FREEZE needs. If page is empty,
>>     * pin vmbuffer to set all_frozen bit
>>     */
>>    if ((options & HEAP_INSERT_FROZEN) &&
>>        (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
>>        visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
>>vmbuffer);
>>
>>so is it actually possible to get into the (all_frozen_set) without
>>holding a pin on the visibilitymap? I haven't investigated this so
>>maybe there are other ways to get into that code block. But the new
>>additions to hio.c get the pin too.
>>
>I was thinking that GetVisibilityMapPins() can somehow unset the pin.
>I gave it a second look. And now I don't think it's possible to get
>into this code block without a pin.  So, I converted this check into
>an assertion.
>
>>
>>4) In heap_xlog_multi_insert we now have this:
>>
>>    if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
>>        PageClearAllVisible(page);
>>    if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
>>        PageSetAllVisible(page);
>>
>>IIUC it makes no sense to have both flags at the same time, right? So
>>what about adding
>>
>>    Assert(!(XLH_INSERT_ALL_FROZEN_SET &&
>>XLH_INSERT_ALL_VISIBLE_CLEARED));
>>
>>to check that?
>>
>Agree.
>
>I placed this assertion to the very beginning of the function. It also
>helped to simplify the code a bit.
>I also noticed, that we were not updating visibility map for
>all_frozen from heap_xlog_multi_insert. Fixed.
>
>>
>>I wonder what to do about the heap_insert - I know there are concerns it
>>would negatively impact regular insert, but is it really true? I suppose
>>this optimization would be valuable even for cases where multi-insert is
>>not possible.
>>
>Do we have something like INSERT .. FREEZE? I only see
>TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations.
>Can you explain, what use-case are we trying to optimize by extending
>this patch to heap_insert()?
>

I might be mistaken, but isn't copy forced to use heap_insert for a
bunch of reasons? For example in the presence of before/after triggers,
statement triggers on partitioned tables, or with volatile functions.

>The new version is attached.
>I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
>Also, I tested this patch with replication and found no issues.
>

Thanks. I'll take a look.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-11-02 14:45:56 Re: Yet another fast GiST build
Previous Message Georgios Kokolatos 2020-11-02 14:43:32 Re: Strange behavior with polygon and NaN