Quick Links

Re: POC: Sharing record typmods between backends

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: POC: Sharing record typmods between backends
Date:	2017-08-12 22:14:09
Message-ID:	CAEepm=34GVhOL+arUx56yx7OPk7=qpGsv3CpO54feqjAwQKm5g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for your feedback. Here are two parts that jumped out at me.
I'll address the other parts in a separate email.

On Sat, Aug 12, 2017 at 1:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> This is complicated, and in the category that I would normally want a
>> stack of heavy unit tests for. If you don't feel like making
>> decisions about this now, perhaps iteration (and incremental resize?)
>> could be removed, leaving only the most primitive get/put hash table
>> facilities -- just enough for this purpose? Then a later patch could
>> add them back, with a set of really convincing unit tests...
>
> I'm inclined to go for that, yes.

I will make it so.

>> > +/*
>> > + * An entry in SharedRecordTypmodRegistry's attribute index. The key is the
>> > + * first REC_HASH_KEYS attribute OIDs. That means that collisions are
>> > + * possible, but that's OK because SerializedTupleDesc objects are arranged
>> > + * into a list.
>> > + */
>> >
>> > +/* Parameters for SharedRecordTypmodRegistry's attributes hash table. */
>> > +const static dht_parameters srtr_atts_index_params = {
>> > + sizeof(Oid) * REC_HASH_KEYS,
>> > + sizeof(SRTRAttsIndexEntry),
>> > + memcmp,
>> > + tag_hash,
>> > + LWTRANCHE_SHARED_RECORD_ATTS_INDEX
>> > +};
>> > +
>> > +/* Parameters for SharedRecordTypmodRegistry's typmod hash table. */
>> > +const static dht_parameters srtr_typmod_index_params = {
>> > + sizeof(uint32),
>> > + sizeof(SRTRTypmodIndexEntry),
>> > + memcmp,
>> > + tag_hash,
>> > + LWTRANCHE_SHARED_RECORD_TYPMOD_INDEX
>> > +};
>> > +
>> >
>> > I'm very much not a fan of this representation. I know you copied the
>> > logic, but I think it's a bad idea. I think the key should just be a
>> > dsa_pointer, and then we can have a proper tag_hash that hashes the
>> > whole thing, and a proper comparator too. Just have
>> >
>> > /*
>> > * Combine two hash values, resulting in another hash value, with decent bit
>> > * mixing.
>> > *
>> > * Similar to boost's hash_combine().
>> > */
>> > static inline uint32
>> > hash_combine(uint32 a, uint32 b)
>> > {
>> > a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
>> > return a;
>> > }
>> >
>> > and then hash everything.
>>
>> Hmm. I'm not sure I understand. I know what hash_combine is for but
>> what do you mean when you say they key should just be a dsa_pointer?
>
>> What's wrong with providing the key size, whole entry size, compare
>> function and hash function like this?
>
> Well, right now the key is "sizeof(Oid) * REC_HASH_KEYS" which imo is
> fairly ugly. Both because it wastes space for narrow cases, and because
> it leads to conflicts for wide ones. By having a dsa_pointer as a key
> and custom hash/compare functions there's no need for that, you can just
> compute the hash based on all keys, and compare based on all keys.

Ah, that. Yeah, it is ugly, both in the pre-existing code and in my
patch. Stepping back from this a bit more, the true key here not an
array of Oid at all (whether fixed sized or variable). It's actually
a whole TupleDesc, because this is really a TupleDesc intern pool:
given a TupleDesc, please give me the canonical TupleDesc equal to
this one. You might call it a hash set rather than a hash table
(key->value associative).

Ideally, we'd get rid of the ugly REC_HASH_KEYS-sized key and the ugly
extra conflict chain, and tupdesc.c would have a hashTupleDesc()
function that is compatible with equalTupleDescs(). Then the hash
table entry would simply be a TupleDesc (that is, a pointer).

There is an extra complication when we use DSA memory though: If you
have a hash table (set) full of dsa_pointer to struct tupleDesc but
want to be able to search it given a TupleDesc (= backend local
pointer) then you have to do some extra work. I think that work is:
the hash table entries should be a small struct that has a union {
dsa_pointer, TupleDesc } and a discriminator field to say which it is,
and the hash + eq functions should be wrappers that follow dsa_pointer
if needed and then forward to hashTupleDesc() (a function that does
hash_combine() over the Oids) and equalTupleDescs().

(That complication might not exist if tupleDesc were fixed size and
could be directly in the hash table entry, but in the process of
flattening it (= holding the attributes in it) I made it variable
size, so we have to use a pointer to it in the hash table since both
DynaHash and DHT work with fixed size entries).

Thoughts?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: POC: Sharing record typmods between backends at 2017-08-12 01:55:04 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2017-08-13 01:25:11	Re: More race conditions in logical replication
Previous Message	Tom Lane	2017-08-12 16:09:34	Re: Timing-sensitive case in src/test/recovery TAP tests