Re: Use virtual tuple slot for Unique node

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Денис Смирнов <darthunix(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use virtual tuple slot for Unique node
Date: 2023-10-27 09:05:12
Message-ID: CAExHW5uku=D2Dy6ST4JHO1sokZF8arecymk2RksioKk2BGb4UA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 27, 2023 at 8:48 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > We may save the size of data in VirtualTupleTableSlot, thus avoiding
> > the first loop. I assume that when allocating
> > VirtualTupleTableSlot->data, we always know what size we are
> > allocating so it should be just a matter of saving it in
> > VirtualTupleTableSlot->size. This should avoid the first loop in
> > tts_virtual_materialize() and give some speed up. We will need a loop
> > to repoint non-byval Datums. I imagine that the pointers to non-byval
> > Datums can be computed as dest_slot->tts_values[natts] =
> > dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data).
> > This would work as long as all the non-byval datums in the source slot
> > are all stored flattened in source slot's data. I am assuming that
> > that would be true in a materialized virtual slot. The byval datums
> > are copied as is. I think, this will avoid multiple memcpy calls, one
> > per non-byval attribute and hence some speedup. I may be wrong though.
>
> hmm, do you think it's common enough that we copy an already
> materialised virtual slot?
>
> I tried adding the following code totts_virtual_copyslot and didn't
> see the NOTICE message when running each of your test queries. "make
> check" also worked without anything failing after adjusting
> nodeUnique.c to always use a TTSOpsVirtual slot.
>
> + if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot))
> + elog(NOTICE, "We copied a materialized virtual slot!");
>
> I did get a failure in postgres_fdw's tests:
>
> server loopback options (table_name 'tab_batch_sharded_p1_remote');
> insert into tab_batch_sharded select * from tab_batch_local;
> +NOTICE: We copied a materialized virtual slot!
> +NOTICE: We copied a materialized virtual slot!
>
> so I think it's probably not that common that we'd gain from that optimisation.

Thanks for this analysis. If we aren't copying a materialized virtual
slot often, no point in adding that optimization.

>
> What might buy us a bit more would be to get rid of the for loop
> inside tts_virtual_copyslot() and copy the guts of
> tts_virtual_materialize() into tts_virtual_copyslot() and set the
> dstslot tts_isnull and tts_values arrays in the same loop that we use
> to calculate the size.
>
> I tried that in the attached patch and then tested it alongside the
> patch that changes the slot type.
>
> master = 74604a37f
> 1 = [1]
> 2 = optimize_tts_virtual_copyslot.patch
>
> Using the script from [2] and the setup from [3] but reduced to 10k
> tuples instead of 1 million.
>
> Times the average query time in milliseconds for a 60 second pgbench run.
>
> query master master+1 master+1+2 m/m+1 m/m+1+2
> Q1 2.616 2.082 1.903 125.65%
> 137.47%
> Q2 9.479 10.593 10.361 89.48%
> 91.49%
> Q3 10.329 10.357 10.627 99.73%
> 97.20%
> Q4 7.272 7.306 6.941 99.53%
> 104.77%
> Q5 7.597 7.043 6.645 107.87%
> 114.33%
> Q6 162.177 161.037 162.807 100.71% 99.61%
> Q7 59.288 59.43 61.294 99.76%
> 96.73%
>
> 103.25% 105.94%
>
> I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but
> did, so the results might not be stable enough to warrant making any
> decisions from.

I am actually surprised to see that eliminating loop is showing
improvements. There aren't hundreds of attributes involved in those
queries. So I share your stability concerns. And even with these
patches, Q2 and Q3 are still slower.

Q1 is consistently giving performance > 25% for both of us. But other
queries aren't showing a whole lot improvement. So I am wondering
whether it's worth pursuing this change; similar to the opinion you
expressed a few emails earlier.

>
> I was uncertain if the old behaviour of when srcslot contains fewer
> attributes than dstslot was intended or not. What happens there is
> that we'd leave the additional old dstslot tts_values in place and
> only overwrite up to srcslot->natts but then we'd go on and
> materialize all the dstslot attributes. I think this might not be
> needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may
> be ok just to materialize the srcslot attributes and ignore the
> previous additional dstslot attributes. Changing it to that would
> make the attached patch more simple.

We seem to use both tt_nvalid and tts_tupleDescriptor->natts. I forgot
what's the difference. If we do what you say, we might end up trying
to access unmaterialized values beyond tts_nvalid. Better to
investigate whether such a hazard exists.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tender wang 2023-10-27 09:05:49 Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
Previous Message Alena Rybakina 2023-10-27 08:53:12 Invalid Path with UpperRel