From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | James Coleman <jtc331(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date: | 2020-09-08 20:37:06 |
Message-ID: | 557e8853-7cb8-9c01-6193-1d4109345401@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08/09/2020 22:25, James Coleman wrote:
> On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>
>> You could also apply the optimization for non-Const expressions, as long
>> as they're stable (i.e. !contain_volatile_functions(expr)).
>
> Is that true? Don't we also have to worry about whether or not the
> value is stable (i.e., know when a param has changed)? There have been
> discussions about being able to cache stable subexpressions, and my
> understanding was that we'd need to have that infrastructure (along
> with the necessarily invalidation when the param changes) to be able
> to use this for non-Const expressions.
Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And
Vars. I think the conditions are the same as those checked in
match_clause_to_partition_key() in partprune.c (it's a long function,
search for "if (!IsA(rightop, Const))"). Not sure it's worth the
trouble, then. But it would be nice to handle queries like "WHERE column
= ANY ($1)"
>> Deconstructing the array Datum into a simple C array on first call would
>> be a win even for very small arrays and for AND semantics, even if you
>> don't use a hash table.
>
> Because you wouldn't have to repeatedly detoast it? Or some other
> reason I'm not thinking of? My intuition would have been that (aside
> from detoasting if necessary) there wouldn't be much real overhead in,
> for example, an array storing integers.
Dealing with NULLs and different element sizes in the array is pretty
complicated. Looping through the array currently looks like this:
/* Loop over the array elements */
s = (char *) ARR_DATA_PTR(arr);
bitmap = ARR_NULLBITMAP(arr);
bitmask = 1;
for (int i = 0; i < nitems; i++)
{
Datum elt;
Datum thisresult;
/* Get array element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
fcinfo->args[1].value = (Datum) 0;
fcinfo->args[1].isnull = true;
}
else
{
elt = fetch_att(s, typbyval, typlen);
s = att_addlength_pointer(s, typlen, s);
s = (char *) att_align_nominal(s, typalign);
fcinfo->args[1].value = elt;
fcinfo->args[1].isnull = false;
}
[do stuff with Datum/isnull]
/* advance bitmap pointer if any */
if (bitmap)
{
bitmask <<= 1;
if (bitmask == 0x100)
{
bitmap++;
bitmask = 1;
}
}
}
Compared with just:
for (int i = 0; i < nitems; i++)
{
Datum elt = datums[i];
[do stuff with the Datum]
}
I'm not sure how much difference that makes, but I presume it's not
zero, and it seems like an easy win when you have the code to deal with
the Datum array representation anyway.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-09-08 21:05:30 | Re: Ideas about a better API for postgres_fdw remote estimates |
Previous Message | Alvaro Herrera | 2020-09-08 20:36:00 | Re: VACUUM (INTERRUPTIBLE)? |