From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: Fast GiST index build |
Date: | 2011-08-16 12:04:37 |
Message-ID: | 4E4A5CD5.2040601@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Looking at the calculation of levelStep:
> + /*
> + * Calculate levelStep by available amount of memory. We should be able to
> + * load into main memory one page of each underlying node buffer (which
> + * are in levelStep below). That give constraint over
> + * maintenance_work_mem. Also we should be able to have subtree of
> + * levelStep level in cache. That give constraint over
> + * effective_cache_size.
> + *
> + * i'th underlying level of sub-tree can consists of
> + * i^maxIndexTuplesPerPage pages at maximum. So, subtree of levelStep
> + * levels can't be greater then 2 * maxIndexTuplesPerPage ^ levelStep
> + * pages. We use some more reserve due to we probably can't take whole
> + * effective cache and use formula 4 * maxIndexTuplesPerPage ^ levelStep =
> + * effectiveCache. We use similar logic with maintenance_work_mem. We
> + * should be able to store at least last pages of all buffers where we are
> + * emptying current buffer to.
> + */
> + effectiveMemory = Min(maintenance_work_mem * 1024 / BLCKSZ,
> + effective_cache_size);
> + levelStep = (int) log((double) effectiveMemory / 4.0) /
> + log((double) maxIndexTuplesPerPage);
> +
I can see that that's equal to the formula given in the paper,
log_B(M/4B), but I couldn't see any explanation for that formula in the
paper. Your explanation makes sense, but where did it come from?
It seems a bit pessimistic: while it's true that the a subtree can't be
larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter
upper bound on it. The max size of a subtree of depth n can be
calculated as the geometric series:
r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r)
where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but
for a large r and small n (which is typical), the 2 *
maxIndexTuplesPerPage^levelStep formula gives a value that's almost
twice as large as the real max size of a subtree.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2011-08-16 12:25:29 | non-ipv6 vs hostnames |
Previous Message | Andrew Dunstan | 2011-08-16 11:23:48 | Re: Re: Should we have an optional limit on the recursion depth of recursive CTEs? |