From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2021-01-28 05:43:50 |
Message-ID: | 20210128054349.GB7450@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
@cfbot: rebased on 55dc86eca70b1dc18a79c141b3567efed910329d
On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote:
> From cfbfb8187f4e8303fe3358b5c909533ee6629efe Mon Sep 17 00:00:00 2001
> From: "dgrowley(at)gmail(dot)com" <dgrowley(at)gmail(dot)com>
> Date: Thu, 2 Jul 2020 16:06:36 +1200
> Subject: [PATCH v12 1/5] Allow estimate_num_groups() to pass back further
> details about the estimation
> +#define SELFLAG_USED_DEFAULT (1 << 0) /* Estimation fell back on one
...
> +typedef struct EstimationInfo
> +{
> + int flags; /* Flags, as defined above to mark special
> + * properties of the estimation. */
Maybe it should be a bits32 ?
(Also, according to Michael, some people preferred 0x01 to 1<<0)
> + /* Ensure we didn't mess up the tracking somehow */
> + Assert(rcstate->mem_used >= 0);
I think these assertions aren't useful since the type is unsigned:
+ uint64 mem_used; /* bytes of memory used by cache */
> + hash_mem_bytes = get_hash_mem() * 1024L;
I think "result cache nodes" should be added here:
doc/src/sgml/config.sgml- <para>
doc/src/sgml/config.sgml- Hash-based operations are generally more sensitive to memory
doc/src/sgml/config.sgml- availability than equivalent sort-based operations. The
doc/src/sgml/config.sgml- memory available for hash tables is computed by multiplying
doc/src/sgml/config.sgml- <varname>work_mem</varname> by
doc/src/sgml/config.sgml: <varname>hash_mem_multiplier</varname>. This makes it
doc/src/sgml/config.sgml- possible for hash-based operations to use an amount of memory
doc/src/sgml/config.sgml- that exceeds the usual <varname>work_mem</varname> base
doc/src/sgml/config.sgml- amount.
doc/src/sgml/config.sgml- </para>
Language fixen follow:
> + * Initialize the hash table to empty.
as empty
> + * prepare_probe_slot
> + * Populate rcstate's probeslot with the values from the tuple stored
> + * in 'key'. If 'key' is NULL, then perform the population by evalulating
sp: evaluating
> From d9c3f2cab13ec26bbd8d1245be6304c506e1f878 Mon Sep 17 00:00:00 2001
> From: "dgrowley(at)gmail(dot)com" <dgrowley(at)gmail(dot)com>
> Date: Tue, 8 Dec 2020 17:54:04 +1300
> Subject: [PATCH v12 4/5] Remove code duplication in nodeResultCache.c
> + * cache_check_mem
> + * Check if we've allocate more than our memory budget and, if so, reduce
allocated
XXX: what patch???
> + * Set the number of bytes each cache entry should consume in the cache.
> + * To provide us with better estimations on how many cache entries we can
> + * store at once we make a call to the excutor here to ask it what memory
spell: executor
once COMMA
> + * inappropriate to do so. If we see that this has been done then we'll
done COMMA
> + * Since we've already estimated the maximum number of entries we can
> + * store at once and know the estimated number of distinct values we'll be
> + * called with, well take this opportunity to set the path's est_entries.
we'll
> + * This will ultimately determine the hash table size that the executor
> + * will use. If we leave this at zero the executor will just choose the
zero COMMA
> + * Set the total_cost accounting for the expected cache hit ratio. We
> + * also add on a cpu_operator_cost to account for a cache lookup. This
> + * will happen regardless of if it's a cache hit or not.
"whether it's a cache hit or not"
> + * Additionally we charge a cpu_tuple_cost to account for cache lookups,
> + * which we'll do regardless of if it was a cache hit or not.
same
> + * get_resultcache_path
> + * If possible,.make and return a Result Cache path atop of 'inner_path'.
dotmake
> +SET work_mem TO '64kB';
> +SET enable_mergejoin TO off;
> +-- Ensure we get some evitions. We're unable to validate the hits and misses
evictions
--
Justin
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-estimate_num_groups-to-pass-back-further-detai.patch | text/x-diff | 9.1 KB |
0002-Allow-users-of-simplehash.h-to-perform-direct-deleti.patch | text/x-diff | 3.5 KB |
0003-Add-Result-Cache-executor-node.patch | text/x-diff | 144.3 KB |
0004-Remove-code-duplication-in-nodeResultCache.c.patch | text/x-diff | 5.1 KB |
0005-Use-a-Result-Cache-node-to-cache-results-from-subpla.patch | text/x-diff | 27.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-01-28 05:44:03 | Re: logical replication worker accesses catalogs in error context callback |
Previous Message | Mark Dilger | 2021-01-28 05:05:02 | Re: new heapcheck contrib module |