Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Gould <daveg(at)sonic(dot)net>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-08 03:30:24
Message-ID: CA+TgmoZuwksiX90mjcr-kpNcEY4wpTnMZ9zPt85Hip19eY6EfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 2, 2018 at 5:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> (1) do we really want to go over to treating ANALYZE's tuple density
> result as gospel, contradicting the entire thrust of the 2011 discussion?
>
>> This tables reltuples is 18 times the actual row count. It will never converge
>> because with 50000953 pages analyze can only adjust reltuples by 0.0006 each time.
>
> But by the same token, analyze only looked at 0.0006 of the pages. It's
> nice that for you, that's enough to get a robust estimate of the density
> everywhere; but I have a nasty feeling that that won't hold good for
> everybody. The whole motivation of the 2011 discussion, and the issue
> that is also seen in some other nearby discussions, is that the density
> can vary wildly.

I think that viewing ANALYZE's result as fairly authoritative is
probably a good idea. If ANALYZE looked at only 0.0006 of the pages,
that's because we decided that 0.0006 of the pages were all it needed
to look at in order to come up with good estimates. Having decided
that, we should turn around and decide that they are 99.94% bunk.

Now, it could be that there are data sets out there were the number of
tuples per page varies widely between different parts of the table,
and therefore sampling 0.0006 of the pages can return quite different
estimates depending on which ones we happen to pick. However, that's
a lot like saying that 0.0006 of the pages isn't really enough, and
maybe the solution is to sample more. Still, it doesn't seem
unreasonable to do some kind of smoothing, where we set the new
estimate = (newly computed estimate * X) + (previous estimate * (1 -
X)) where X might be 0.25 or whatever; perhaps X might even be
configurable.

One thing to keep in mind is that VACUUM will, in many workloads, tend
to scan the same parts of the table over and over again. For example,
consider a database of chess players which is regularly updated with
new ratings information. The records for active players will be
updated frequently, but the ratings for deceased players will rarely
change. Living but inactive players may occasionally become active
again, or may be updated occasionally for one reason or another. So,
VACUUM will keep scanning the pages that contain records for active
players but will rarely or never be asked to scan the pages for dead
players. If it so happens that these different groups of players have
rows of varying width -- perhaps we store more detailed data about
newer players but don't have full records for older ones -- then the
overall tuple density estimate will come to resemble more and more the
density of the rows that are actively being updated, rather than the
overall density of the whole table.

Even if all the tuples are the same width, it might happen in some
workload that typically insert a record, update it N times, and then
it stays fixed after that. Suppose we can fit 100 tuples into a page.
On pages were all of the records have reached their final state, there
will be 100 tuples. But on pages where updates are still happening
there will -- after VACUUM -- be fewer than 100 tuples per page,
because some fraction of the tuples on the page were dead row
versions. That's why we were vacuuming them. Suppose typically half
the tuples on each page are getting removed by VACUUM. Then, over
time, as the table grows, if only VACUUM is ever run, our estimate of
tuples per page will converge to 50, but in reality, as the table
grows, the real number is converging to 100.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-03-08 03:45:15 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Previous Message Robert Haas 2018-03-08 03:02:27 Re: Server won't start with fallback setting by initdb.