From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reduce maximum error in tuples estimation after vacuum. |
Date: | 2013-07-08 10:36:11 |
Message-ID: | 014201ce7bc6$f71eb950$e55c2bf0$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, July 03, 2013 1:21 PM Kyotaro HORIGUCHI wrote:
> Hello,
>
> > I could see the same output with your latest script, also I could
> reproduce
> > the test if I run the test with individual sql statements.
> > One of the main point for reproducing individual test was to keep
> autovacuum
> > = off.
>
> I see. Autovacuum's nap time is 60 sconds for the default
> settings. Your operation might help it to snipe the window
> between the last massive delete and the next explict vacuum in
> store_result().. Anyway setting autovacuum to off should aid to
> make clean environment fot this issue.
>
> > Now I can look into it further, I have still not gone through in
> detail
> > about your new approach to calculate the reltuples, but I am
> wondering
> > whether there can be anyway with which estimates can be improved with
> > different calculation in vac_estimate_reltuples().
>
> I'll explain this in other words alghough It might be
> repetitious.
>
> It is tough to decide how to modify there. Currently I decided to
> preserve vac_estimate_reltuples as possible as it is. For that
> objective, I picked up old_rel_tuples as intermediate variable
> for the aid to 'deceive' the function. This can be different form
> deciding to separate this estimation function from that for
> analyze.
>
> As I described before, vac_estimates_reltuples has a presumption
> that the tuple density in skipped pages is not so different from
> that in whole table before vacuuming. Since the density is
> calculated without using any hint about the skipped pages, and it
> cannot tell how much tuples aganst pg_class.reltuples is already
> dead, the value can be far different from the true one and cannot
> be verified. Given that we canot use
> pg_stat_user_tables.n_dead_tup, reading all pages can fix it but
> the penalty should be intolerable.
>
> Using FSM to know the used bytes in skipped pages (which is all
> visible by the definition) seems to give good estimations of the
> tuples in the skipped pages to some extent assuming the
> uniformity of tuple length. Of course strong deviation in length
> can deceive the algorithm.
>
> Does it make sense for you?
I understood your patch's algorithm, but still I have doubt in my mind that
if the next analyze can correct the estimates,
Why would that be not sufficient. Please refer my last mail for analysis of
same
http://www.postgresql.org/message-id/000601ce77ad$7d3388e0$779a9aa0$@kapila@
huawei.com
Performance Data
------------------
I have checked few cases where FSM is not updated accurately, this patch
seems to give much worse results than current code.
Test with Patch
--------------------
1. Test given by you where tuple density is non-uniform
postgres=# drop table if exists t;
DROP TABLE
postgres=# create table t (a int, b int, c int, d int default 0, e int
default 0
, f int default 0);
CREATE TABLE
postgres=# insert into t (select a, (random() * 100000)::int from
generate_serie
s((select count(*) from t) + 1, 1000000) a);
INSERT 0 1000000
postgres=# update t set b = b + 1 where a < (select count(*) from t) * 0.7;
UPDATE 699999
postgres=# vacuum t;
VACUUM
postgres=# delete from t where a < (select count(*) from t) * 0.99;
DELETE 989999
postgres=# vacuum t;
VACUUM
postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*)
from t
) as tuples, reltuples::float / (select count(*) from t) as ratio from
pg_stat_
user_tables s, pg_class c where s.relname = 't' and c.relname = 't';
relpages | n_live_tup | reltuples | tuples | ratio
----------+------------+-----------+--------+------------------
6370 | 13596 | 13596 | 10001 | 1.35946405359464
(1 row)
2. Test where tuple density is non-uniform and FSM updates before
calculation in Vacuum are not accurate.
I have created index on table to simulate this test
postgres=# drop table if exists t;
DROP TABLE
postgres=# create table t (a int, b int, c int, d int default 0, e int
default 0
, f int default 0);
CREATE TABLE
postgres=# create index on t(a);
CREATE INDEX
postgres=# insert into t (select a, (random() * 100000)::int from
generate_serie
s((select count(*) from t) + 1, 1000000) a);
INSERT 0 1000000
postgres=# update t set b = b + 1 where a < (select count(*) from t) * 0.7;
UPDATE 699999
postgres=# vacuum t;
VACUUM
postgres=# delete from t where a < (select count(*) from t) * 0.99;
DELETE 989999
postgres=# vacuum t;
VACUUM
postgres=# select c.relpages, s.n_live_tup, c.reltuples, (select count(*)
from t
) as tuples, reltuples::float / (select count(*) from t) as ratio from
pg_stat_
user_tables s, pg_class c where s.relname = 't' and c.relname = 't';
relpages | n_live_tup | reltuples | tuples | ratio
----------+------------+--------------+--------+------------------
6370 | 1001327 | 1.00133e+006 | 10001 | 100.122687731227
(1 row)
Now this result in tuple estimation worse than current code.
I think we need to have more tests to show that new calculation is better in
all cases than current calculation.
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2013-07-08 10:38:44 | Re: XLogInsert scaling, revisited |
Previous Message | Dave Cramer | 2013-07-08 10:23:45 | Re: [HACKERS] JPA + enum == Exception |