Re: Using Expanded Objects other than Arrays from plpgsql

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Date: 2024-12-23 03:52:08
Message-ID: CACxu=vKjvJSftCgBPa78tbdpBnJOq9DCrtV8x=o0_cuCOoxLbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Dec 18, 2024 at 12:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com> writes:
> > My bad, sorry for the long confusing email, I figured out that I was
> > calling the wrong macro when getting my matrix datum and inadvertently
> > expanding RO pointers as well, I've fixed that issue, and everything is
> > working great! No extra expansions and my support functions are working
> > well, I need to go through a few more places in the API to add more
> support
> > but otherwise the fixes Tom has put into plpgsql have worked perfectly
> and
> > the library now appears to be behaving optimally! I can get down to
> doing
> > some benchmarks and head-to-head with the C and Python bindings to
> compare
> > against.
>
> So, just to clarify where we're at: you are satisfied that the current
> patch-set does what you need?
>

I have some updates on this thread based on some graph algorithms I've
ported from the Python/C graphblas libraries.

All of the plpgsql expanded object optimizations so far are working well, I
can minimize object expansion in most cases, there are a couple I haven't
been able to work around but I'm still getting excellent benchmarking
numbers on some large test graphs:

LiveJournal Orkut
Nodes 3,997,962 3,072,441
Edges 34,681,185 117,185,037
Triangles 177,820,130 627,583,972

Seconds Edges/S Seconds Edges/S
Tri Count LL 2.80s 12,386,138 32.03s 3,658,602
Tri Count LU 1.91s 18,157,688 16.38s 7,156,338
Tri Centrality 1.55s 22,374,958 12.22s 9,589,610
Page Rank 8.10s 4,281,628 23.14s 5,064,176

That's on a 2020 era 4 core economy laptop and is in line with what the
C/Python/Julia bindings get on similar hardware.

There are a few cases where I have to force an expansion, I work around
this by calling a `wait()` function, which expands the datum, calls
GrB_wait() on it (a nop in this case) and returns a r/w pointer. You can
see this in the following Triangle Counting function which is a matrix
multiplication of a graph to itself, using itself as a mask. This matrix
reduces to the triangle count (times six):

create or replace function tcount_b(graph matrix) returns bigint language
plpgsql as
$$
begin
graph = wait(graph);
graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph,
descr=>'s');
return reduce_scalar(graph) / 6;
end;
$$;

DEBUG: new_matrix
DEBUG: flatten_matrix
DEBUG: matrix_wait
DEBUG: expand_matrix -- expansion happens here in wait()
DEBUG: new_matrix
DEBUG: matrix_mxm -- mxm does not re-expand the object, good!
DEBUG: expand_semiring
DEBUG: new_semiring
DEBUG: new_matrix
DEBUG: expand_descriptor
DEBUG: new_descriptor
DEBUG: matrix_reduce_scalar -- neither does reduce, good!
DEBUG: new_scalar
DEBUG: scalar_div_int32
DEBUG: new_scalar
DEBUG: cast_scalar_int64

If I take out the call to wait(), then mxm calls expand_matrix 3 times as
it did before your optimizations.

The other task we'd talked about was generalizing the existing
> heuristics in exec_assign_value() and plpgsql_exec_function() that
> say that array-type values should be forced into expanded R/W form
> when being assigned to an array-type PL/pgSQL variable. The argument
> for that is that the PL/pgSQL function might subsequently do a lot of
> subscripted accesses to the array (which'd benefit from working with
> an expanded array) while never doing another assignment and thus not
> having any opportunity to revisit the decision. The counter-argument
> is that it might *not* do such accesses, so that the expansion was
> just a waste of cycles. So this is squishy enough that I'd prefer to
> have some solid use-cases to look at before trying to generalize it.
>
> It's sounding to me like you're going to end up in a place where all
> your values are passed around in expanded form already and so you have
> little need for that optimization.

If so, I'd prefer not to go any
> further than the present patch-set for now. Adding "type support"
> hooks as discussed would be a substantial amount of work, so I'd
> like to have a more compelling case for it before doing that.
>

I agree it makes sense to have more use cases before making deeper
changes. I only work with expanded forms, but need to call wait() to
pre-expand the object to avoid multiple expansions in functions that can
take the same object in multiple parameters. This is a pretty common
pattern in GraphBLAS (and linear algebra in general) where (many) matrices
are commutable to themselves in several ways like multiplication,
element-wise operations, and element masking.

I'm not sure if eliminating wait() is a good enough use case, it would
definitely be nice to get rid of but I can document it pretty thoroughly
and it's relatively easy to catch.

-Michel

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Shershnev 2024-12-23 10:00:00 Re: Seamless age (xid) replacement
Previous Message Tatsuo Ishii 2024-12-23 02:13:50 Re: [pgpool-general: 9291] pgpool Connection issue: ERROR: unable to read message kind

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-23 03:53:33 Re: Memory leak in WAL sender with pgoutput (v10~)
Previous Message mahendrakar s 2024-12-23 03:52:06 Last removed wal segment file