From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <dgrowley(at)gmail(dot)com>, David Rowley <drowley(at)postgresql(dot)org>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: jsonb crash |
Date: | 2021-09-29 22:45:13 |
Message-ID: | CAApHDvqE2bAKPy7YXcDWoDP0_AVfwHRjpsnfPhi3LX2iwv-XTg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 30 Sept 2021 at 11:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > So I'm now thinking you weren't that far wrong to be looking at
> > hashability of the top-level qual operator. What is missing is
> > that you have to restrict candidate cache keys to be the *direct*
> > arguments of such an operator. Looking any further down in the
> > expression introduces untenable assumptions.
>
> Hmm ... I think that actually, a correct statement of the semantic
> restriction is
>
> To be eligible for memoization, the inside of a join can use the
> passed-in parameters *only* as direct arguments of hashable equality
> operators.
>
> In order to exploit RestrictInfo-based caching, you could make the
> further restriction that all such equality operators appear at the
> top level of RestrictInfo clauses. But that's not semantically
> necessary.
>
> As an example, assuming p1 and p2 are the path parameters,
>
> (p1 = x) xor (p2 = y)
>
> is semantically safe to memoize, despite the upper-level xor
> operator. But the example we started with, with a parameter
> used as an argument of jsonb_exists, is not safe to memoize
> because we have no grounds to suppose that two hash-equal values
> will act the same in jsonb_exists.
I'm not really sure if I follow your comment about the top-level qual
operator. I'm not really sure why that has anything to do with it.
Remember that we *never* do any hashing of any values from the inner
side of the join. If we're doing a parameterized nested loop and say
our parameter has the value of 1, the first time through we don't find
any cached tuples, so we run the plan from the inner side of the
nested loop join and cache all the tuples that we get from it. When
the parameter changes, we check if the current value of the parameter
has any tuples cached. This is what the hashing and equality
comparison does. If the new parameter value is 2, then we'll hash that
and probe the hash table. Since we've only seen value 1 so far, we
won't get a cache hit. If at some later point in time we see the
parameter value of 1 again, we hash that, find something in the hash
bucket for that value then do an equality test to ensure the values
are actually the same and not just the same hash bucket or hash value.
At no point do we do any hashing on the actual cached tuples.
This allows us to memoize any join expression, not just equality
expressions. e.g if the SQL is: SELECT * FROM t1 INNER JOIN t2 on t1.a
> t2.a; assuming t2 is on the inner side of the nested loop join,
then the only thing we hash is the t1.a parameter and the only thing
we do an equality comparison on is the current value of t1.a vs some
previous value of t1.a that is stored in the hash table. You can see
here that if t1.a and t2.a are not the same data type then that's of
no relevance as we *never* hash or do any equality comparisons on t2.a
in the memoize code.
The whole thing just hangs together by the assumption that parameters
with the same value will always yield the same tuples. If that's
somehow a wrong assumption, then we have a problem.
I'm not sure if this helps explain how it's meant to work, or if I
just misunderstood you.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-09-29 22:48:32 | Re: prevent immature WAL streaming |
Previous Message | Peter Geoghegan | 2021-09-29 22:32:09 | Re: Enabling deduplication with system catalog indexes |