Using Expanded Objects other than Arrays from plpgsql

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Using Expanded Objects other than Arrays from plpgsql
Date: 2024-10-20 16:32:13
Message-ID: CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello!

I'm working on the OneSparse Postgres extension that wraps the GraphBLAS
API with a SQL interface for doing graph analytics and other sparse linear
algebra operations:

https://onesparse.github.io/OneSparse/test_matrix_header/

OneSparse wraps the GraphBLAS opaque handles in Expanded Object Header
structs that register ExpandedObjectMethods for flattening and expanding
objects from their "live" handle that can be passed to the SuiteSparse API,
and their "flat" representations are de/serialized and get written as TOAST
values. This works perfectly.

However during some single source shortest path (sssp) benchmarking I was
getting good numbers but not as good as I expected, and noticed some
sublinear scaling as the problems got bigger. It seems my objects are
getting constantly flattened/expanded from plpgsql during the iterative
phases of an algorithm. As the solution grows the result vector gets
bigger and the expand/flatten cost increases on each iteration.

I found this thread from the original path implementation from Tom Lane in
2015:

https://www.postgresql.org/message-id/E1Ysvgz-0000s0-DP%40gemulon.postgresql.org

In this initial implementation, a few heuristics have been hard-wired
> into plpgsql to improve performance for arrays that are stored in
> plpgsql variables. We would like to generalize those hacks so that
> other datatypes can obtain similar improvements, but figuring out some
> appropriate APIs is left as a task for future work.

Sure enough looking at the code I see this condition:

https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c#L549

This is a showstopper for me as I can't see a good way around it, I tried
to "fake" an array but didn't get too far down that approach but I may
still pull it off as GraphBLAS objects are very much array-like, but I
figured I'd also open the discussion on how we can fix this permanently so
that future extensions don't run into this penalty.

My first thought was to add a flag to CREATE TYPE like "EXPANDED = true" or
some other better name that indicates that the object can be safely taken
ownership of in its expanded state and not copied. The GraphBLAS is
specific in its API in that the object handle holder is the owner of the
reference, so that would work fine for me. Another option I guess is some
kind of whitelist or blacklist telling plpgsql which types can be kept
expanded.

And then there is just removing the existing restriction on arrays only.
Is any other expanded object out there really interested in being
flattened/expanded over and over again?

Thanks,

-Michel

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2024-10-20 17:13:31 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Barry Walker 2024-10-20 16:30:55 Help Resolving Compiler Errors With enable-dtrace Flag

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-10-20 17:13:31 Re: Using Expanded Objects other than Arrays from plpgsql
Previous Message Tom Lane 2024-10-20 15:56:55 Re: Fix C23 compiler warning