Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-12-07 00:33:00
Message-ID: CAApHDvpeTHnD=zQ41HKL829R7JCvxZRVyyRfNR5bexL-x=s6Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

On Sat, 5 Dec 2020 at 14:08, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
>
> I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite condition).

That seems fair. Likely it might be nicer if simplehash.h played it
safe and put usages of the macro in additional parenthesis. I see a
bit of a mix of other places where we #define SH_EQUAL. It looks like
the one in execGrouping.c and tidbitmap.c are also missing the
additional parenthesis.

> +ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
> + const ResultCacheKey *key2)
>
> Since key2 is not used, maybe name it unused_key ?

I'm not so sure it's a great change. The only place where people see
that is in the same area that mentions " 'key2' is never used"

> + /* Make a guess at a good size when we're not given a valid size. */
> + if (size == 0)
> + size = 1024;
>
> Should the default size be logged ?

I'm not too sure if I know what you mean here. Should it be a power of
2? It is. Or do you mean I shouldn't use a magic number?

> + /* Update the memory accounting */
> + rcstate->mem_used -= freed_mem;
>
> Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry however, that assertion is after another decrement).

Good idea.

> + * 'specialkey', if not NULL, causes the function to return false if the entry
> + * entry which the key belongs to is removed from the cache.
>
> duplicate entry (one at the end of first line and one at the beginning of second line).

Well spotted.

> For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries should probably have the same size.
> Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ?

I'd like to leave this as is. If we were to check if we've gone over
memory budget before the resultcache_insert() then we're doing a
memory check even for cache hits. That's additional effort. I'd prefer
only to check if we've gone over the memory budget in cases where
we've actually allocated more memory.

In each case we can go one allocation over budget, so I don't see what
doing the check beforehand gives us.

> + if (unlikely(!cache_reduce_memory(rcstate, key)))
> + return NULL;
>
> Does the new entry need to be released in the above case?

No. cache_reduce_memory returning false will have removed "key" from the cache.

I'll post an updated patch on the main thread once I've looked at your
followup review.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Brian Davis 2020-12-07 00:33:30 Re: Consider parallel for lateral subqueries with limit
Previous Message Masahiko Sawada 2020-12-07 00:30:03 Re: Proposed patch for key managment