Re: Using Expanded Objects other than Arrays from plpgsql

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Date: 2024-10-24 01:39:03
Message-ID: CACxu=vJMqLUZR1N-AT4cXw9JHMsN=5-iYvPvKWy-CcKwz_9aBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Oct 23, 2024 at 8:21 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com> writes:
> > Here's another example:
>
> > CREATE OR REPLACE FUNCTION test2(graph matrix)
> > RETURNS bigint LANGUAGE plpgsql AS
> > $$
> > BEGIN
> > perform set_element(graph, 1, 1, 1);
> > RETURN nvals(graph);
> > end;
> > $$;
> > CREATE FUNCTION
> > postgres=# select test2(matrix('int32'));
> > DEBUG: new_matrix
> > DEBUG: matrix_get_flat_size
> > DEBUG: flatten_matrix
> > DEBUG: scalar_int32
> > DEBUG: new_scalar
> > DEBUG: matrix_set_element
> > DEBUG: DatumGetMatrix
> > DEBUG: expand_matrix
> > DEBUG: new_matrix
> > DEBUG: DatumGetScalar
> > DEBUG: matrix_get_flat_size
> > DEBUG: matrix_get_flat_size
> > DEBUG: flatten_matrix
> > DEBUG: context_callback_matrix_free
> > DEBUG: context_callback_scalar_free
> > DEBUG: matrix_nvals
> > DEBUG: DatumGetMatrix
> > DEBUG: expand_matrix
> > DEBUG: new_matrix
> > DEBUG: context_callback_matrix_free
> > DEBUG: context_callback_matrix_free
> > test2
> > -------
> > 0
> > (1 row)
>
> I'm a little confused by your debug output. What are "scalar_int32"
> and "new_scalar", and what part of the plpgsql function is causing
> them to be invoked?
>

GraphBLAS scalars hold a single element value for the matrix type.
Internally, they are simply 1x1 matrices (much like vectors are 1xn
matrices). The function signature is:

set_element(a matrix, i bigint, j bigint, s scalar)

There is a "CAST (integer as scalar)" function (scalar_int32) that casts
Postgres integers to GraphBLAS GrB_INT32 scalar elements (which calls
new_scalar because like vectors and matrices, they are expanded objects
which have a GrB_Scalar "handle"). Scalars are useful for working with
individual values, for example reduce() returns a scalar. There are way
more efficient ways to push huge C arrays of values into matrices but for
now I'm just working at the element level.

Another thing that confuses me is why there's a second flatten_matrix
> operation happening here. Shouldn't set_element return its result
> as a R/W expanded object?
>

That confuses me too, and my default assumption is always that I'm doing it
wrong. set_element does return a R/W object afaict, here is the return:

https://github.com/OneSparse/OneSparse/blob/main/src/matrix.c#L1726

where:

#define OS_RETURN_MATRIX(_matrix) return EOHPGetRWDatum(&(_matrix)->hdr)

> > I would expect that to return 1. If I do "graph = set_element(graph, 1,
> 1,
> > 1)" it works.
>
> I think you have a faulty understanding of PERFORM. It's defined as
> "evaluate this expression and throw away the result", so it's *not*
> going to change "graph", not even if set_element declares that
> argument as INOUT.

Faulty indeed, I was going from the plpgsql statement documentation:

"Sometimes it is useful to evaluate an expression or SELECT query but
discard the result, for example when calling a function that has
side-effects but no useful result value."

My understanding of "side-effects" was flawed there, but I'm fine with "x =
set_element(x...)" anyway as I was trying to follow the example of
array_append et al.

> (Our interpretation of OUT arguments for functions
> is that they're just an alternate notation for specifying the function
> result.) If you want to avoid the explicit assignment back to "graph"
> then the thing to do would be to declare set_element as a procedure,
> not a function, with an INOUT argument and then call it with CALL.
>

I'll stick with the assignment.

That's only cosmetically different though, in that the updated
> "graph" value is still passed back much as if it were a function
> result, and then the CALL infrastructure knows it has to assign that
> back to the argument variable. And, as I tried to explain earlier,
> that code path currently has no mechanism for avoiding making a copy
> of the graph somewhere along the line: it will pass "graph" to the
> procedure as either a flat Datum or a R/O expanded object, so that
> set_element will be required to copy that before modifying it.
>

Right, I'm still figuring out exactly what that code flow is. This is my
first dive into these corners of the code so thank you for being patient
with me. I promise to write up some expanded object documentation if this
works!

> We can imagine extending whatever we do for "x := f(x)" cases so that
> it also works during "CALL p(x)". But I think that's only going to
> yield cosmetic or notational improvements so I don't want to start
> with doing that. Let's focus first on improving the existing
> infrastructure for the f(x) case.
>

+1

-Michel

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2024-10-24 02:10:31 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Andy Hartman 2024-10-23 23:59:26 Re: Backup

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-10-24 01:57:14 Re: Commutation of array SOME/ANY and ALL operators
Previous Message Matthew Morrissette Vance 2024-10-24 01:27:59 Commutation of array SOME/ANY and ALL operators