Re: mcvstats serialization code is still shy of a load

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: mcvstats serialization code is still shy of a load
Date: 2019-06-26 16:31:13
Message-ID: 9201.1561566673@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems) \
>> is both woefully underdocumented and completely at variance with
>> reality. It doesn't seem to be accounting for the actual data values.

> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.

Well, it should have some other name then. Or *at least* a comment.
It's unbelievably misleading as it stands.

> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.

No, I'm on board with checking the lengths. I just don't like how
hard it is to discern what's being checked.

>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing. What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,

> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.

[ squint ... ] OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations. It'd read better as

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did. That's okay given that the two extant call sites
are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying. So really what I'd like to see here is

/* remember start of datum for maxalign reference */
raw = (char *) data;

/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));

/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-26 17:06:55 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Previous Message Tomas Vondra 2019-06-26 16:08:08 Re: mcvstats serialization code is still shy of a load