Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

From: Jerry Brenner <jbrenner(at)guidewire(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?
Date: 2023-12-05 14:03:24
Message-ID: CACoKFYRcqfQsOQpq9HRPeCJmsY5FsdugmSFj49x_H+9UEcefxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Apologies if I'm not using the appropriate Postgres-specific terms. The
simplest implementation would consider only the information that is
consistent across systems and executions - node types, relation and index
names (not ids), aliases ... It would ignore bind variable values and
execution statistics. This would make it easy to find:

- Queries with different plans (we would then investigate the causes)
- Queries where the plans changed over time (we would then investigate
the causes)
- The aggregated cost of a query plan shared by multiple queries (due to
differences in the number of items in an IN list, ...) This is useful when
no single query in the group registers high, but the group of queries
registers high.
- Because we use consistent table and index names across environments,
this would make it possible to find queries with different plans in
different environments.

In a perfect world, the hash would include the filters, index conds, ...
with the constant values masked out, but I realize that's much more
complicated. (Without this, we could see plans that are applying different
numbers of predicates against an index mapping to the same hash value, but
it would still be a big improvement.)

As mentioned before, we are currently storing the explain plans in a
database table. We use a combination of columns and syntax when querying
for the long executions to display some contextual information about the
explain plans (does it have a Materialize node, Init Plan, Sub Plan, ...)
Based on that little contextual information, we can see that there are
multiple plans for some queries. Based on manual investigation, we know
that there other plan differences. It is very expensive right now to try
to figure out if a plan is changing over time, why some executions are more
expensive than others, ...

Thanks,
Jerry

On Mon, Dec 4, 2023 at 7:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
> >> Jerry Brenner <jbrenner(at)guidewire(dot)com> writes:
> >>> Both Oracle and SQL Server have
> >>> consistent hash values for query plans and that makes it easy to
> identify
> >>> when there are multiple plans for the same query. Does that concept
> exist
> >>> in later releases of Postgres (and is the value stored in the json
> explain
> >>> plan)?
>
> >> No, there's no support currently for obtaining a hash value that's
> >> associated with a plan rather than an input query tree.
>
> > PlannerGlobal includes a no_query_jumble that gets inherited by all
> > its lower-level nodes, so adding support for hashes compiled from
> > these node structures would not be that complicated. My point is that
> > the basic infrastructure is in place in the tree to be able to do
> > that, and it should not be a problem to even publish the compiled
> > hashes in EXPLAIN outputs, behind an option of course.
>
> Well, yeah, we could fairly easily activate that infrastructure for
> plans, but we haven't. More to the point, it's not clear to me that
> that would satisfy the OP's request for "consistent" hash values.
> The hashes would vary depending on object OID values, system version,
> possibly endianness, etc.
>
> I'm also wondering exactly what the OP thinks qualifies as different
> plans. Remembering the amount of fooling-around that's gone on with
> querytree hashes to satisfy various people's ill-defined desires for
> pg_stat_statements aggregation behavior, I'm not really eager to buy
> into the same definitional morass at the plan level.
>
> regards, tom lane
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jerry Brenner 2023-12-05 14:28:54 Include a timestamp in future versions of pg_stat_statements when when a query entered the cache?
Previous Message Tom Lane 2023-12-05 03:29:55 Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?