From: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: VACUUM and ANALYZE disagreeing on what reltuples means |
Date: | 2017-09-06 07:45:20 |
Message-ID: | CAJrrPGeQcsBXhuJA1vQ25PS0wAqfo0v_mXtwmt4VcOmxKSvfCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
> On 7/25/17 12:55 AM, Tom Lane wrote:
>
>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>
>>> It seems to me that VACUUM and ANALYZE somewhat disagree on what
>>> exactly reltuples means. VACUUM seems to be thinking that reltuples
>>> = live + dead while ANALYZE apparently believes that reltuples =
>>> live
>>>
>>
>> The question is - which of the reltuples definitions is the right
>>> one? I've always assumed that "reltuples = live + dead" but perhaps
>>> not?
>>>
>>
>> I think the planner basically assumes that reltuples is the live
>> tuple count, so maybe we'd better change VACUUM to get in step.
>>
>>
> Attached is a patch that (I think) does just that. The disagreement was
> caused by VACUUM treating recently dead tuples as live, while ANALYZE
> treats both of those as dead.
>
> At first I was worried that this will negatively affect plans in the
> long-running transaction, as it will get underestimates (due to reltuples
> not including rows it can see). But that's a problem we already have
> anyway, you just need to run ANALYZE in the other session.
Thanks for the patch.
From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.
- num_tuples);
+ num_tuples - nkeep);
With the above correction, there is a problem in reporting the number
of live tuples to the stats.
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899818 | 799636 | 100182
(1 row)
The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),
/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.
While testing this patch, I found another problem that is not related to
this patch. When the vacuum command is executed mutiple times on
a table with no dead rows, the number of reltuples value is slowly
reducing.
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899674 | 899674 | 0
(1 row)
postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899622 | 899622 | 0
(1 row)
postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899570 | 899570 | 0
(1 row)
In lazy_scan_heap() function, we force to scan the last page of the
relation to avoid the access exclusive lock in lazy_truncate_heap
if there are tuples in the last page. Because of this reason, the
scanned_pages value will never be 0, so the vac_estimate_reltuples
function will estimate the tuples based on the number of tuples
from the last page of the relation. This estimation is leading to
reduce the number of retuples.
I am thinking whether this problem really happen in real world scenarios
to produce a fix?
Regards,
Hari Babu
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-09-06 08:04:40 | Re: psql - add special variable to reflect the last query status |
Previous Message | Michael Meskes | 2017-09-06 07:41:59 | Re: 【ECPG】strncpy function does not set the end character '\0' |