Re: HOT: Incomplete issues

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: HOT: Incomplete issues
Date: 2007-06-27 08:06:54
Message-ID: 2e78013d0706270106g611ae372pd8be2334132a0f8f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/26/07, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> Hi,
>
> I'm testing HOT patches, applying to CVS HEAD.

Thanks a lot for your tests. I am posting a revised patch on -patches.
Please use that for further testing.

In the last few days, many people have reviewed the patch including
Simon, Heikki, Greg and Korry. I shall post a separate mail summarizing
the changes since the last revision.

>
> - MVCC-safe CLUSTER
> When I clustered a table with HOT-updated tuples, I saw the following
> error
> message. The HOT patch latest posted does not support MVCC-safe CLUSTER.
> | ERROR: unexpected HeapTupleSatisfiesVacuum result

Yes, this is a known issue. Heikki had posted a patch to resolve this
conflict.

- Number of unremovable tuples reported by VACUUM VERBOSE
> HOT-updated tuples (HEAPTUPLE_DEAD_CHAIN) are counted as "keeped" and
> VACUUM VERBOSE prints them as "cannot be removed yet". However, we can
> actually remove them. We can reuse the data space of HOT-updated tuples,
> but need to keep their item pointers. We'd better to show them as two
> different messages -- for example, unremovable tuples and unreusable
> item pointers.

We can not remove a HEAPTUPLE_DEAD_CHAIN tuple because even if
it is dead, its might be the only way to reach to the live tuple at the end
of the chain.
Chain pruning logic would ensure that we remove most of such tuples before
running vacuum on the page, but few might still be left. We can not
reuse the data space just yet because then we loose the xmax/xmin check.
Also with several redirecting line pointers, the HOT chain becomes very
complex
and unmanageable.

There are in fact quite a few scenarios here:

1. A dead tuple which is part of a HOT chain can not be removed
2. A dead tuple which is marked LP_DELETE is removed and reported as
"removable"
3. A redirect-dead line pointer is removed and reported as "removable"

In case 3, no real tuple is being removed. The tuple might have been
already reused or vacuumed. So it could be slight misleading.

Another problem with the current reporting is that if the original dead
tuple
is tracked with a separate lp-deleted line pointer and the original root
offset is redirect-dead then it might be reported twice as "removable".
Once for lp-deleted tuple and again for the redirect-dead line pointer.
May be we should report the the redirect-dead offsets as
"removable redirected offsets" and not count them in "removable" tuples ?

- ANALYZE and statistics of dead rows
> Since redirected or redirect-dead item pointers are counted as "dead
> rows",
> we overestimates the number of dead rows. It confuses statistics and
> ill-affects to autovacuums; If autovacuum does ANALYZE, the number of
> dead tuples looks suddenly increased and it triggers unnecessary VACUUMs
> by the next autovacuum.

A redirect-dead line pointer consumes 4 bytes of dead space in a page. If a
table is full of
redirect-dead line pointers, we should trigger vacuum on the table. May be
we can maintain
separate stats about redirect-dead line pointers and give them lower
significance
while deciding whether to vacuum or not.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2007-06-27 08:30:49 Re: Frustrating issue with PGXS
Previous Message Fabien COELHO 2007-06-27 07:53:40 Re: Frustrating issue with PGXS