Re: extended stats on partitioned tables

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: extended stats on partitioned tables
Date: 2021-12-13 13:53:48
Message-ID: 20211213135348.GT17618@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2021 at 11:23:19PM +0100, Tomas Vondra wrote:
> On 12/12/21 22:32, Justin Pryzby wrote:
> > On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
> > > The one thing bugging me a bit is that the regression test checks only a
> > > GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> > > too, but that seems tricky because most queries will use per-partitions
> > > stats.
> >
> > You mean because the quals are pushed down to the scan node.
> >
> > Does that indicate a deficiency ?
> >
> > If extended stats are collected for a parent table, selectivity estimates based
> > from the parent would be better; but instead we use uncorrected column
> > estimates from the child tables.
> >
> > From what I see, we could come up with a way to avoid the pushdown, involving
> > volatile functions/foreign tables/RLS/window functions/SRF/wholerow vars/etc.
> > But would it be better if extended stats objects on partitioned tables were to
> > collect stats for both parent AND CHILD ? I'm not sure. Maybe that's the
> > wrong solution, but maybe we should still document that extended stats on
> > (empty) parent tables are often themselves not used/useful for selectivity
> > estimates, and the user should instead (or in addition) create stats on child
> > tables.
> >
> > Or, maybe if there's no extended stats on the child tables, stats on the parent
> > table should be consulted ?
>
> Maybe, but that seems like a mostly separate improvement. At this point I'm
> interested only in testing the behavior implemented in the current patches.

I don't want to change the scope of the patch, or this thread, but my point is
that the behaviour already changed once (the original regression) and now we're
planning to change it again to fix that, so we ought to decide on the expected
behavior before writing tests to verify it.

I think it may be impossible to use the "dependencies" statistic with inherited
stats. Normally the quals would be pushed down to the child tables. But, if
they weren't pushed down, they'd be attached to something other than a scan
node on the parent table, so the stats on that table wouldn't apply (right?).

Maybe the useless stats types should have been prohibited on partitioned tables
since v10. It's too late to change that, but perhaps now they shouldn't even
be collected during analyze. The dependencies and MCV paths are never called
with rte->inh==true, so maybe we should Assert(!inh), or add a comment to that
effect. Or the regression tests should "memorialize" the behavior. I'm still
thinking about it.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-12-13 14:01:57 Re: should we document an example to set multiple libraries in shared_preload_libraries?
Previous Message Robert Haas 2021-12-13 13:53:37 Re: O(n) tasks cause lengthy startups and checkpoints