Re: Analyze on table creation?

From: James Coleman <jtc331(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: Analyze on table creation?
Date: 2023-06-27 14:14:22
Message-ID: CAAaqYe-QcKnV9TjAr-2AHqoh8fnpBcYjXfcEe7PjeDoU2zaR5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 26, 2023 at 4:16 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Mon, Jun 26, 2023 at 4:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2023-06-26 13:40:49 -0400, James Coleman wrote:
> > > Have we ever discussed running an analyze immediately after creating a table?
> >
> > That doesn't make a whole lot of sense to me - we could just insert the
> > constants stats we wanted in that case.
> >
>
> I thought that was implicit in that, but fair enough :)
>
> > > Consider the following:
> > >
> > > create table stats(i int, t text not null);
> > > explain select * from stats;
> > > Seq Scan on stats (cost=0.00..22.70 rows=1270 width=36
> > > analyze stats;
> > > explain select * from stats;
> > > Seq Scan on stats (cost=0.00..0.00 rows=1 width=36)
> > >
> > > Combined with rapidly increasing error margin on row estimates when
> > > adding joins means that a query joining to a bunch of empty tables
> > > when a database first starts up can result in some pretty wild plan
> > > costs.
> >
> > The issue is that the table stats are likely going to quickly out of date in
> > that case, even a hand full of inserts (which wouldn't trigger
> > autovacuum analyzing) would lead to the "0 rows" stats causing very bad plans.
> >
>
> It's not obvious to me (as noted elsewhere in the thread) which is
> worse: a bunch of JOINs on empty tables can result in (specific
> example) plans with cost=15353020, and then trigger JIT, and...here we
> collide with my other thread about JIT [1].
>
> Regards,
> James Coleman
>
> 1: https://www.postgresql.org/message-id/CAAaqYe-g-Q0Mm5H9QLcu8cHeMwok%2BHaxS4-UC9Oj3bK3a5jPvg%40mail.gmail.com

Thinking about this a bit more: it seems like what we're missing is either:

1. A heuristic for "this table will probably remain empty", or
2. A way to invalidate "0 rows" stats more quickly on even a handful of inserts.

I think one of those (ignoring questions about "how" for now) would
solve both cases?

Regards,
James Coleman

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2023-06-27 14:32:27 PostgreSQL 16 Beta 2 release announcement draft
Previous Message Tom Lane 2023-06-27 14:12:40 Re: Assert !bms_overlap(joinrel->relids, required_outer)