From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: mcvstats serialization code is still shy of a load |
Date: | 2019-06-26 16:31:13 |
Message-ID: | 9201.1561566673@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems) \
>> is both woefully underdocumented and completely at variance with
>> reality. It doesn't seem to be accounting for the actual data values.
> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.
Well, it should have some other name then. Or *at least* a comment.
It's unbelievably misleading as it stands.
> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.
No, I'm on board with checking the lengths. I just don't like how
hard it is to discern what's being checked.
>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing. What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,
> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.
[ squint ... ] OK, I think I misread this:
statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;
I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations. It'd read better as
/* remember start of datum for maxalign reference */
raw = (char *) data;
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did. That's okay given that the two extant call sites
are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying. So really what I'd like to see here is
/* remember start of datum for maxalign reference */
raw = (char *) data;
/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
Or, of course, this could all go away if we got rid of the
bogus maxaligning...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-06-26 17:06:55 | Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.) |
Previous Message | Tomas Vondra | 2019-06-26 16:08:08 | Re: mcvstats serialization code is still shy of a load |