Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-09-08 20:37:06
Message-ID: 557e8853-7cb8-9c01-6193-1d4109345401@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/09/2020 22:25, James Coleman wrote:
> On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>
>> You could also apply the optimization for non-Const expressions, as long
>> as they're stable (i.e. !contain_volatile_functions(expr)).
>
> Is that true? Don't we also have to worry about whether or not the
> value is stable (i.e., know when a param has changed)? There have been
> discussions about being able to cache stable subexpressions, and my
> understanding was that we'd need to have that infrastructure (along
> with the necessarily invalidation when the param changes) to be able
> to use this for non-Const expressions.

Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And
Vars. I think the conditions are the same as those checked in
match_clause_to_partition_key() in partprune.c (it's a long function,
search for "if (!IsA(rightop, Const))"). Not sure it's worth the
trouble, then. But it would be nice to handle queries like "WHERE column
= ANY ($1)"

>> Deconstructing the array Datum into a simple C array on first call would
>> be a win even for very small arrays and for AND semantics, even if you
>> don't use a hash table.
>
> Because you wouldn't have to repeatedly detoast it? Or some other
> reason I'm not thinking of? My intuition would have been that (aside
> from detoasting if necessary) there wouldn't be much real overhead in,
> for example, an array storing integers.

Dealing with NULLs and different element sizes in the array is pretty
complicated. Looping through the array currently looks like this:

/* Loop over the array elements */
s = (char *) ARR_DATA_PTR(arr);
bitmap = ARR_NULLBITMAP(arr);
bitmask = 1;

for (int i = 0; i < nitems; i++)
{
Datum elt;
Datum thisresult;

/* Get array element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
fcinfo->args[1].value = (Datum) 0;
fcinfo->args[1].isnull = true;
}
else
{
elt = fetch_att(s, typbyval, typlen);
s = att_addlength_pointer(s, typlen, s);
s = (char *) att_align_nominal(s, typalign);
fcinfo->args[1].value = elt;
fcinfo->args[1].isnull = false;
}

[do stuff with Datum/isnull]

/* advance bitmap pointer if any */
if (bitmap)
{
bitmask <<= 1;
if (bitmask == 0x100)
{
bitmap++;
bitmask = 1;
}
}
}

Compared with just:

for (int i = 0; i < nitems; i++)
{
Datum elt = datums[i];

[do stuff with the Datum]
}

I'm not sure how much difference that makes, but I presume it's not
zero, and it seems like an easy win when you have the code to deal with
the Datum array representation anyway.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-09-08 21:05:30 Re: Ideas about a better API for postgres_fdw remote estimates
Previous Message Alvaro Herrera 2020-09-08 20:36:00 Re: VACUUM (INTERRUPTIBLE)?