And yet, if I try to implement a similar mechanism, if successful, will my revision be considered?
 
regards


03.11.2019, 22:15, "Tomas Vondra" <tomas.vondra@2ndquadrant.com>:

On Sun, Nov 03, 2019 at 02:17:15PM +0300, Павел Ерёмин wrote:

   I completely agree with all of the above. Therefore, the proposed
   mechanism may entail larger improvements (and not only VACUUM).


I think the best think you can do is try implementing this ...

I'm afraid the "improvements" essentially mean making various imporant
parts of the system much more complicated and expensive. There's a
trade-off between saving 8B per row and additional overhead (during
vacuum etc.), and it does not seem like a winning strategy. What started
as "we can simply look at the next row version" is clearly way more
complicated and expensive.

The trouble here is that it adds dependency between pages in the data
file. That for example means that during cleanup of a page it may be
necessary to modify the other page, when originally that would be
read-only in that checkpoint interval. That's essentially write
amplification, and may significantly increase the amount of WAL due to
generating FPW for the other page.

   I can offer the following solution.
   For VACUUM, create a hash table.
   VACUUM scanning the table sees that the version (tuple1) has t_ctid filled
   and refers to the address tuple2, it creates a structure into which it
   writes the address tuple1, tuple1.xid, length tuple1 (well, and other
   information that is needed), puts this structure in the hash table by key
   tuple2 addresses.
   VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if
   it finds it, it evaluates the connection between them and makes a decision
   on cleaning.


We know VACUUM is already pretty expensive, so making it even more
expensive seems pretty awful. And the proposed solution seems damn
expensive. We already do something similar for indexes - we track
pointers for removed rows, so that we can remove them from indexes. And
it's damn expensive because we don't know where in the index the tuples
are - so we have to scan the whole indexes.

This would mean we have to do the same thing for table, because we don't
know where in the table are the older versions of those rows, because we
don't know where the other rows are. That seems mighty expensive.

Not to mention that this does nothing for page-level vacuum, which we
do when trying to fit another row on a page (e.g. for HOT). This has to
be absolutely cheap, we certainly are not going to do lookups of other
pages or looking for older versions of the row, and so on.

Being able to do visibility decisions based on the tuple alone (or
possibly page-level + tuple information) has a lot of value, and I don't
think we want to make this more complicated.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services