Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

From: Tomáš Janoušek <tomi(at)nomi(dot)cz>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date: 2013-10-06 18:37:00
Message-ID: 20131006183645.GA27556@nomi.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Oct 05, 2013 at 08:22:54PM +0200, Tomas Vondra wrote:
> I'm on 64-bit architecture and the example works with int32, which means
> the sizes should be about this:
>
> hash_element_t => 20B
> hash_bucket_t => 4B + (20B * items in the bucket [in steps of 5])
> hash_table_t => 4B + space for buckets
>
> In the example above, there's ~20k unique values in each group. The
> threshold is 20 items per bucket on average, so that's 1024 buckets, and
> the buckets are almost full.
>
> So for single group, the hash table size is about
>
> 4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB
>
> There are 4000 groups, so the total estimate is ~1.6GB in total.
>
> However when executed (9.2, 9.3 and HEAD behave exactly the same), the
> query consumes almost ~5GB of RAM (excluding shared buffers).

I think the missing thing is the memory allocator bookkeeping overhead.
You're assuming that hash_element_t.value takes 8B for the pointer and 4B for
the value itself, but using malloc it takes another at least 20 bytes, and
from a quick glance at backend/utils/mmgr/aset.c it seems that palloc is
certainly not without its overhead either.

Also, each additional level of pointers adds execution overhead and increases
the likelihood of cache misses. I'd suggest a few improvements, if I may:

1. Drop hash_element_t altogether, store length in hash_bucket_t and alloc
hash_bucket_t.items of size nitems * length bytes. I doubt that storing
the hash values has a benefit worth the storage and code complexity
overhead (you're storing fixed-size ints, not large blobs that are
expensive to compare and hash).

2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
For fun, try not hashing those ints at all and see how that performs (that,
I think, is what you get from HashSet<int> in Java/C#).

3. Consider dropping buckets in favor of open addressing (linear probing,
quadratic, whatever). This avoids another level of pointer indirection.

It's been a few years since I've done stuff this low level, so I won't go into
suggesting a different data structure -- I have honestly no idea what's the
best way to count the number of distinct integers in a list.

[1] http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash
[2] http://en.wikipedia.org/wiki/Jenkins_hash_function

Best regards,
--
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-10-06 18:43:59 Re: pgbench progress report improvements - split 3 v2 - A
Previous Message Kevin Grittner 2013-10-06 17:34:48 Re: SSI freezing bug