Re: Using Expanded Objects other than Arrays from plpgsql

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Date: 2024-12-07 00:51:50
Message-ID: CACxu=vKCSSqO6y0ES4SJnFSMBa8szANOykQkG3L1LsLnN2v0JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Dec 3, 2024 at 4:42 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com> writes:
> > Here's a WIP patch for a pgexpanded example in src/test/modules.
>
> I didn't look at your patch yet, but in the meantime here's an update
> that takes the next step towards what I promised.
>

Awesome! I made a support function for my set_element(matrix, i, j, value)
function and it works great, but I do have a couple questions about how to
move forward with some of my other methods. For reference, here's the
set_element test function, and a trace of function calls, after
implementing the support function for set_element, the expand_matrix
function is called twice, first by nvals(), and later by the first
set_element, but not the subsequent sets, so the true clause of the
PLPGSQL_RWOPT_INPLACE case in plpgsql_param_eval_var_check works great and
as expected:

postgres=# create or replace function test_se(graph matrix) returns matrix
language plpgsql as
$$
declare
nvals bigint;
begin
graph = wait(graph);
nvals = nvals(graph);
raise notice 'nvals: %', nvals;
graph = set_element(graph, 4, 2, 42);
graph = set_element(graph, 4, 3, 43);
graph = set_element(graph, 4, 4, 44);
return graph;
end;
$$;
CREATE FUNCTION
postgres=# select nvals(test_se('int32'::matrix));
DEBUG: new_matrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: matrix_wait
DEBUG: DatumGetMatrix
DEBUG: expand_matrix <- wait expands with support function
DEBUG: new_matrix
DEBUG: matrix_nvals
DEBUG: DatumGetMatrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: expand_matrix <- nvals() reexpands
DEBUG: new_matrix
DEBUG: context_callback_matrix_free
NOTICE: nvals: 0
DEBUG: scalar_int32
DEBUG: new_scalar
DEBUG: matrix_set_element
DEBUG: DatumGetMatrix < set_element does not reexpand, yay!
DEBUG: DatumGetScalar
DEBUG: context_callback_scalar_free
DEBUG: scalar_int32
DEBUG: new_scalar
DEBUG: matrix_set_element
DEBUG: DatumGetMatrix
DEBUG: DatumGetScalar
DEBUG: context_callback_scalar_free
DEBUG: scalar_int32
DEBUG: new_scalar
DEBUG: matrix_set_element
DEBUG: DatumGetMatrix
DEBUG: DatumGetScalar
DEBUG: context_callback_scalar_free
DEBUG: matrix_nvals
DEBUG: DatumGetMatrix
DEBUG: context_callback_matrix_free
DEBUG: context_callback_matrix_free
┌───────┐
│ nvals │
├───────┤
│ 3 │
└───────┘

My question is about nvals = nvals(graph) in that function above, the
object is flattened and then rexpanded, even after the object was expanded
and returned by wait() [1] into a r/w pointer. set_element honors the
expanded object from wait(), but nvals does not. It seems like I want to
be able to pass the argument as a RO pointer, but I'm not sure how to
trigger the "else" clause in the PLPGSQL_RWOPT_INPLACE case. I can see how
the support function triggers the if, but I don't see a similar way to
trigger the else. I'm almost certainly missing something.

[1](Due to the asynchronous nature of the GraphBLAS API, there is a
GrB_wait() function to wait until an object is "complete" computationally,
but since this object was just expanded, there is no pending work, so the
wait() is essentially a noop but a handy way for me to return a r/w pointer
for subsequent operations).

set_element and wait are the simple case where there is only one reference
to the r/w object in the argument list. As we've discussed I have many
functions that can possibly have multiple references to the same object.
The core operation of the library is matrix multiplication, and all other
complex functions follow a very similar pattern, so I'll just focus on that
one here:

CREATE FUNCTION mxm(
a matrix,
b matrix,
op semiring default null,
inout c matrix default null,
mask matrix default null,
accum binaryop default null,
descr descriptor default null
)
RETURNS matrix

The order of arguments mostly follows the order in the C API. a and b are
the left and right matrix operands and the matrix product is the return
value. If c is not null, then it is a pre-created return value which may
contain partial results already from some previous operations, otherwise
mxm creates a new matrix of the correct dimensions and returns that. I
think the inout is meaningless as it doesn't seem to change anything, but
I'm using it as a visual indication in code that c can be the return value
if it's not null.

Here's an example of doing Triangle Counting using Burkhardt's method [2],
where a, b, and the mask are all the same adjacency matrix (here the
Newman/karate graph. The 'plus_pair' semiring is optimized for structural
counting, and the descriptor 's' tells suitesparse to only use the
structure of the mask and not to consider the values):

[2] https://doi.org/10.1177/1473871616666393

CREATE OR REPLACE FUNCTION public.tcount_burkhardt(graph matrix)
RETURNS bigint
LANGUAGE plpgsql
AS $$
begin
graph = wait(graph);
graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
return reduce_scalar(graph) / 6;
end;
$$;

postgres=# select tcount_burkhardt(graph) from karateg;
DEBUG: matrix_wait
DEBUG: DatumGetMatrix
DEBUG: expand_matrix <- wait expands and returns r/w
pointer with support function
DEBUG: new_matrix
DEBUG: matrix_mxm <- mxm starts here
DEBUG: DatumGetMatrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: expand_matrix <- expanding left operand again
DEBUG: new_matrix
DEBUG: DatumGetMatrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: expand_matrix <- expanding right operand again
DEBUG: new_matrix
DEBUG: DatumGetSemiring
DEBUG: expand_semiring
DEBUG: new_semiring
DEBUG: new_matrix
DEBUG: DatumGetMatrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: expand_matrix <- expanding mask argument again
DEBUG: new_matrix
DEBUG: DatumGetDescriptor
DEBUG: expand_descriptor
DEBUG: new_descriptor
DEBUG: context_callback_matrix_free
DEBUG: context_callback_descriptor_free
DEBUG: context_callback_matrix_free
DEBUG: context_callback_semiring_free
DEBUG: context_callback_matrix_free
DEBUG: context_callback_matrix_free
DEBUG: matrix_reduce_scalar
DEBUG: DatumGetMatrix
DEBUG: matrix_get_flat_size
DEBUG: flatten_matrix
DEBUG: expand_matrix <- reduce also
re-expands matrix
DEBUG: new_matrix
DEBUG: new_scalar
DEBUG: scalar_div_int32
DEBUG: DatumGetScalar
DEBUG: new_scalar
DEBUG: cast_scalar_int64
DEBUG: DatumGetScalar
DEBUG: context_callback_scalar_free
DEBUG: context_callback_scalar_free
DEBUG: context_callback_matrix_free
DEBUG: context_callback_matrix_free
┌──────────────────┐
│ tcount_burkhardt │
├──────────────────┤
│ 45 │
└──────────────────┘

mxm calls expand_matrix three times for each of the three arguments.
Ideally I'd like the already expanded rw pointer from wait() to be honored
by mxm so that it doesn't re-expand the object three times but, like
set_element, not at all.

Hope that question makes sense, still going on the main theory that I'm not
understanding the support function, and maybe the c argument thing being
optional throws a wrench in the plan, and I'm happy to try and find a
workaround for that. Maybe always requiring the result to be
pre-constructed and then making c required and reassigning back to the same
input argument is the right approach?

Some good news I always like seeing is that 45 is the right answer. Very
close to having optimal sparse linear algebra in Postgres!

Thanks for your help! I'll move onto your comments on the test module next.

-Michel

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Igor Korot 2024-12-07 03:36:40 Insert records in the tavke only if they are not exist
Previous Message Adrian Klaver 2024-12-06 23:34:47 Re: tds_fdw DB-Library error: DB #: 20002, DB Msg: Adaptive Server connection failed

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-12-07 02:13:26 Re: proposal: schema variables
Previous Message Jeff Davis 2024-12-07 00:44:10 Re: Statistics Import and Export