Re: A design for amcheck heapam verification

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A design for amcheck heapam verification
Date: 2017-08-30 02:22:21
Message-ID: CAEepm=3JswDqbQ+p+EOWSijQzZqH4TJoqvJzDvKRFc4PpvduKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 30, 2017 at 1:00 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Aug 29, 2017 at 4:34 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Some drive-by comments on the lib patch:
>
> I was hoping that you'd look at this, since you'll probably want to
> use a bloom filter for parallel hash join at some point. I've tried to
> keep this one as simple as possible. I think that there is a good
> chance that it will be usable for parallel hash join with multiple
> batches. You'll need to update the interface a little bit to make that
> work (e.g., bring-your-own-hash interface), but hopefully the changes
> you'll need will be limited to a few small details.

Indeed. Thank you for working on this! To summarise a couple of
ideas that Peter and I discussed off-list a while back: (1) While
building the hash table for a hash join we could build a Bloom filter
per future batch and keep it in memory, and then while reading from
the outer plan we could skip writing tuples out to future batches if
there is no chance they'll find a match when read back in later (works
only for inner joins and only pays off in inverse proportion to the
join's selectivity); (2) We could push a Bloom filter down to scans
(many other databases do this, and at least one person has tried this
with PostgreSQL and found it to pay off[1]).

To use this for anything involving parallelism where a Bloom filter
must be shared we'd probably finish up having to create a shared
version of bloom_init() that either uses caller-provided memory and
avoids the internal pointer, or allocates DSA memory. I suppose you
could consider splitting your bloom_init() function up into
bloom_estimate() and bloom_init(user_supplied_space, ...) now, and
getting rid of the separate pointer to bitset (ie stick char
bitset[FLEXIBLE_ARRAY_MEMBER] at the end of the struct)?

>> +bloom_add_element(bloom_filter *filter, unsigned char *elem, Size len)
>>
>> I think the plan is to use size_t for new stuff[1].
>
> I'd forgotten.
>
>> This is another my_log2(), right?
>>
>> It'd be nice to replace both with fls() or flsl(), though it's
>> annoying to have to think about long vs int64 etc. We already use
>> fls() in two places and supply an implementation in src/port/fls.c for
>> platforms that lack it (Windows?), but not the long version.
>
> win64 longs are only 32-bits, so my_log2() would do the wrong thing
> for me on that platform. pow2_truncate() is provided with a number of
> bits as its argument, not a number of bytes (otherwise this would
> work).

Hmm. Right.

> Ideally, we'd never use long integers, because its width is platform
> dependent, and yet it is only ever used as an alternative to int
> because it is wider than int. One example of where this causes
> trouble: logtape.c uses long ints, so external sorts can use half the
> temp space on win64.

Agreed, "long" is terrible.

>> +bloom_prop_bits_set(bloom_filter *filter)
>> +{
>> + int bitset_bytes = NBITS(filter) / BITS_PER_BYTE;
>> + int64 bits_set = 0;
>> + int i;
>> +
>> + for (i = 0; i < bitset_bytes; i++)
>> + {
>> + unsigned char byte = filter->bitset[i];
>> +
>> + while (byte)
>> + {
>> + bits_set++;
>> + byte &= (byte - 1);
>> + }
>> + }
>
> I don't follow what you mean here.

I was just observing that there is an opportunity for code reuse here.
This function's definition would ideally be something like:

double
bloom_prop_bits_set(bloom_filter *filter)
{
return popcount(filter->bitset, NBYTES(filter)) / (double) NBITS(filter);
}

This isn't an objection to the way you have it, since we don't have a
popcount function yet! We have several routines in the tree for
counting bits, though not yet the complete set from Hacker's Delight.
Your patch brings us one step closer to that goal. (The book says
that this approach is good far sparse bitsets, but your comment says
that we expect something near 50%. That's irrelevant anyway since a
future centralised popcount() implementation would do this in
word-sized chunks with a hardware instruction or branch-free-per-word
lookups in a table and not care at all about sparseness.)

+ * Test if bloom filter definitely lacks element.

I think where "Bloom filter" appears in prose it should have a capital
letter (person's name).

[1] http://www.nus.edu.sg/nurop/2010/Proceedings/SoC/NUROP_Congress_Cheng%20Bin.pdf

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-08-30 02:32:10 Re: pgbench: Skipping the creating primary keys after initialization
Previous Message Peter Eisentraut 2017-08-30 02:20:30 Re: segment size depending *_wal_size defaults (was increasing the default WAL segment size)