Re: Analyze on table creation?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Analyze on table creation?
Date: 2023-06-26 19:38:03
Message-ID: CAFj8pRAZdz2mJaHwjfYO59s2ABMw6MMAMhMu+MRqwhGbC4HP6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> >
> > Originally, until the table had minimally one row, the PostgreSQL
> calculated with 10 pages. It was fixed (changed) in PostgreSQL 14.
> >
> >
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3d351d916b20534f973eda760cde17d96545d4c4
> >
>
> From that commit message:
> > Historically, we've considered the state with relpages and reltuples
> > both zero as indicating that we do not know the table's tuple density.
> > This is problematic because it's impossible to distinguish "never yet
> > vacuumed" from "vacuumed and seen to be empty". In particular, a user
> > cannot use VACUUM or ANALYZE to override the planner's normal heuristic
> > that an empty table should not be believed to be empty because it is
> > probably about to get populated. That heuristic is a good safety
> > measure, so I don't care to abandon it, but there should be a way to
> > override it if the table is indeed intended to stay empty.
>
> So that implicitly provides our reasoning for not analyzing up-front
> on table creation.
>
> I haven't thought about this too deeply yet, but it seems plausible to
> me that the dangers of overestimating row count here (at minimum in
> queries like I described with lots of joins) are higher than the
> dangers of underestimating, which we would do if we believed the table
> was empty. One critical question would be how fast we can assume the
> table will be auto-analyzed (i.e., how fast would the underestimate be
> corrected.
>

I found this issue a few years ago. This application had 40% of tables with
one or zero row, 30% was usual size, and 30% was sometimes really big. It
can be "relative" common in OLAP applications.

The estimation was terrible. I don't think there can be some better
heuristic. Maybe we can introduce some table option like expected size,
that can be used when real statistics are not available.

Some like

CREATE TABLE foo(...) WITH (default_relpages = x)

It is not a perfect solution, but it allows fix this issue by one command.

> Regards,
> James Coleman
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-06-26 19:54:44 Detecting use-after-free bugs using gcc's malloc() attribute
Previous Message Pavel Luzanov 2023-06-26 19:29:46 Re: psql: Add role's membership options to the \du+ command