Re: Infinite Interval

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Joseph Koshakow <koshy44(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Infinite Interval
Date: 2023-09-20 12:09:00
Message-ID: CAEZATCX-zD+EAOwNh66VVKKepNL6Rz3kwFX8eG7c5Y4mDyDqJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 20 Sept 2023 at 11:27, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> if I remove IntervalAggState's element: MemoryContext, it will not work.
> so I don't understand what the above sentence means...... Sorry. (it's
> my problem)
>

I don't see why it won't work. The point is to try to simplify
do_interval_accum() as much as possible. Looking at the current code,
I see a few places that could be simpler:

+ X.day = newval->day;
+ X.month = newval->month;
+ X.time = newval->time;
+
+ temp.day = state->sumX.day;
+ temp.month = state->sumX.month;
+ temp.time = state->sumX.time;

Why do we need these local variables X and temp? It could just add the
values from newval directly to those in state->sumX.

+ /* The rest of this needs to work in the aggregate context */
+ old_context = MemoryContextSwitchTo(state->agg_context);

Why? It's not allocating any memory here, so I don't see a need to
switch context.

So basically, do_interval_accum() could be simplified to:

static void
do_interval_accum(IntervalAggState *state, Interval *newval)
{
/* Count infinite intervals separately from all else */
if (INTERVAL_IS_NOBEGIN (newval))
{
state->nInfcount++;
return;
}
if (INTERVAL_IS_NOEND(newval))
{
state->pInfcount++;
return;
}

/* Update count of finite intervals */
state->N++;

/* Update sum of finite intervals */
if (unlikely(pg_add_s32_overflow(state->sumX.month, newval->month,
&state->sumX.month)))
ereport(ERROR,
errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
errmsg("interval out of range"));

if (unlikely(pg_add_s32_overflow(state->sumX.day, newval->day,
&state->sumX.day)))
ereport(ERROR,
errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
errmsg("interval out of range"));

if (unlikely(pg_add_s64_overflow(state->sumX.time, newval->time,
&state->sumX.time)))
ereport(ERROR,
errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
errmsg("interval out of range"));

return;
}

and that can be further refactored, as described below, and similarly
for do_interval_discard(), except using pg_sub_s32/64_overflow().

> > Also, this needs to include serialization and deserialization
> > functions, otherwise these aggregates will no longer be able to use
> > parallel workers. That makes a big difference to queryE, if the size
> > of the test data is scaled up.
> >
> I tried, but failed. sum(interval) result is correct, but
> avg(interval) result is wrong.
>
> SELECT sum(b) ,avg(b)
> ,avg(b) = sum(b)/count(*) as should_be_true
> ,avg(b) * count(*) = sum(b) as should_be_true_too
> from interval_aggtest_1m; --1million row.
> The above query expects two bool columns to return true, but actually
> both returns false.(spend some time found out parallel mode will make
> the number of rows to 1_000_002, should be 1_000_0000).
>

I think the reason for your wrong results is this code in
interval_avg_combine():

+ if (state2->N > 0)
+ {
+ /* The rest of this needs to work in the aggregate context */
+ old_context = MemoryContextSwitchTo(agg_context);
+
+ /* Accumulate interval values */
+ do_interval_accum(state1, &state2->sumX);
+
+ MemoryContextSwitchTo(old_context);
+ }

The problem is that using do_interval_accum() to add the 2 sums
together also adds 1 to the count N, making it incorrect. This code
should only be adding state2->sumX to state1->sumX, not touching
state1->N. And, as in do_interval_accum(), there is no need to switch
memory context.

Given that there are multiple places in this file that need to add
intervals, I think it makes sense to further refactor, and add a local
function to add 2 finite intervals, along the lines of the code above.
This can then be called from do_interval_accum(),
interval_avg_combine(), and interval_pl(). And similarly for
subtracting 2 finite intervals.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2023-09-20 12:12:56 Re: should frontend tools use syncfs() ?
Previous Message a.rybakina 2023-09-20 12:06:35 Re: POC, WIP: OR-clause support for indexes