Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?
Date: 2023-03-26 20:16:06
Message-ID: BB147CED-F3C4-4C22-B7FD-E2247010B70B@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> adrian(dot)klaver(at)aklaver(dot)com wrote:
>
>> bryn(at)yugabyte(dot)com replied to laurenz(dot)albe(at)cybertec(dot)at:
>>
>> Thanks for the link to your SQL file at the line where you get the row count in the way that you describe... I noted that neither of these functions has a refcursor formal argument and that, rather, you open (i.e. create) each of the three cursors that you use within the two functions that uses them.
>
> CREATE FUNCTION materialize_foreign_table(...)
>
> CREATE FUNCTION db_migrate_refresh(...)
>
> What is not formal about the above?

I used the term of art "formal argument" ordinarily to denote what's listed in parentheses at the start of a subprogram definition. The term stands in contrast to "actual argument"—meaning the expression that's used in a subprogram invocation to provide a value for the corresponding formal argument. Go to this page:

dotnettutorials.net/lesson/types-of-function-arguments-in-python/

and search in it for "Example: formal and actual function arguments in python". The blurb that comes just before the example says what I just said. It seems to be more common to talk about formal and actual "parameters". But the PG doc prefers "argument".

I didn't at all mean that "formal" is good and that Laurenz's code is not formal and therefore is bad! So sorry if you took it to mean this.

> Though it does not matter as, back to the docs again:
>
> www.postgresql.org/docs/current/plpgsql-cursors.html
>
> "All access to cursors in PL/pgSQL goes through cursor variables, which are always of the special data type refcursor. One way to create a cursor variable is just to declare it as a variable of type refcursor..." Though maybe you are trying to differentiate between bound and unbound refcursor's...

Ah... you missed my point entirely. Looks like my prose was opaque. I was referring to this paradigm as described at the start of the page that you referenced:

«
A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»

<aside> This seems to me to imply that the producer function with a "refcursor" return is "security definer" and that the consumer subprogram cannot access the tables of interest with explicit "select”. I thinks that's nice. </aside>

I needed to know that the definition and use of each "refcursor" variable, in Laurentz's code, are in the same subprogram because it's only when this is the case that using an ordinary "select count(*)" and an ordinary "for" loop, where the "select" is written in place within the "for" clause, are viable alternatives.

> Also order by is not relevant for getting a count.

Yes, of course. I know that. (But it does matter in many cases of results presentation.) I was simply noting that Laurenz’s use-case didn’t need ordering and that the "select" statements will be faster than if ordering were needed. Sorry if I implied something that I didn’t mean too. (The example in Laurenz's post, www.cybertec-postgresql.com/en/pagination-problem-total-result-count/ <http://www.cybertec-postgresql.com/en/pagination-problem-total-result-count/>, does use "order by".)

>> I did a little test an saw that this:
>>
>> move last in cur;
>> get diagnostics n = row_count;
>>
>> reported just 1—in contrast to what you used:
>>
>> move forward all in cur;
>
> Because they are different actions. The first directly moves to the last row in the cursor and counts just that row. The second moves through all the rows left in the cursor and counts all the rows it traversed. It is spelled out here:
>
> https://www.postgresql.org/docs/current/sql-move.html
>
> "The count is the number of rows that a FETCH command with the same parameters would have returned (possibly zero)."

Yes, I'm afraid that I expressed myself poorly again. I meant to say that while I know nothing about the internals, I can't see how "move last in cur" can know where the last row is unless it traverses the entire result set from its current position until it finds no more rows. And while its doing this, it may just as well count the rows it traverses. It would seem, therefore, that a single operation that moves to where you asked for and that tells you how many rows it traversed would be enough—rather than two that both achieve the same effect in the same time where one gives you the count of rows traversed and the other doesn't. But never mind. The simple way to see it is that the operations simply do what they do—and I won't worry about how they might have been conceived differently.

I did some timing tests. I’ve copied the results and the code at the end—just for completeness. Here’s what I concluded.

(1) (and this is unremarkable), if you can find a way to meet your requirement with just a single SQL statement, then this is bound to be best. I chose a simple requirement: read from a table and insert half the total number of rows into one table and the remainder into another. This single SQL statement meets that requirement:

with
chunk(n) as (select count(*)/2 from s.t0),
i1 as (insert into s.t1(k, v) select k, v from s.t0 where k < (select n from chunk) returning k)
insert into s.t2(k, v) select k, v from s.t0 where k >= (select n from chunk);

Internet search shows that this CTE approach seems to be the popular PG pattern to achieve what Oracle’s dedicated multi-table "insert" achieves. On the assumption that the source table suffers constant concurrent changes from other sessions, the PL/pgSQL loop that meets this requirement _must_ use the cursor-based approach for getting the count that we’ve been discussing. I’m glad to have learned all this because some requirements (e.g. when the source table traversal has to do DDLs) can’t be met with a single SQL statement, and so a PL/pgSQL loop must be used.

(2) Even though the declaration (in a txn that you commit) materializes the entire result set, this manages to be a lot faster than doing this yourself with a temp table like this:

with
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);

The index is meant to help "move". Maybe some use cases wouldn’t need it. Same goes for recording the count in its own one-row, one-column table.

(3) The times to do this:

move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;

and this:

move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;

are the same (within the limits of my measurements). But, of course, there’s no need to use the second alternative.

(4) The time to get the count using "move forward all in cur" is about the same as doing an ordinary "count(*)"—but "move forward" is a little slower. I'm therefore puzzled by Laurentz's "the result set count comes for free [with the cursor approach]" in his blog post (x-ref above). Its appeal seems to me to be limited to its rock-solid semantics.

(5) It's somewhat quicker to move around in a holdable cursor than in a non-holdable one. But, of course, you pay a price when you create the former.

Here are my timings—all in seconds and all with three digits of precision. I used PG 15.2 in a Ubuntu VM on my Mac Book. I run psql in the bare macOS and connect into the VM. On some runs, some times are higher than usual. I expect that the way the VM handles files has something to do with this. I chose a run where the timings were pretty stable to copy here.

Insert 10,000,000 rows time: 13.7

Populate temp tables time: 10.6

Native count(*) time: 0.633
0.637
0.677
0.781

Declare "Is-Holdable" time: 1.91

Is-Holdable move fwd count time: 0.703
0.700
0.700
0.700

Declare "Not-Holdable" time: 0.000

Not-Holdable move fwd count time: 0.877
0.866
0.864
0.872

Declare "Is-Holdable" time: 1.78

Is-Holdable move back & fore time: 0.691
0.695
0.709
0.708

Declare "Not-Holdable" time: 0.000

Not-Holdable move back & fore time: 0.872
0.871
0.874
0.888

Here's the code. If you want to run it, just copy it into a single file and start it at the psql prompt. You need just to connect as a regular user that has "create" on the database that you use.
--------------------------------------------------------------------------------

-- Create the objects.
\c :db :u
set search_path = pg_catalog, pg_temp;
drop schema if exists s cascade;
create schema s;
create table s.t(k int primary key, v int not null);

/*
Format the elapsed time since t0 in seconds
with three digits of precision.
*/;
create function s.elapsed_time(t0 in double precision)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
e constant double
precision not null := extract(epoch from clock_timestamp()) - t0;

t constant text not null :=
case
when e >= 100000.0 then '>= 100K' ||' '
when e >= 10000.0 then to_char(round(e/100.0)*100.0, '99,999' )||' '
when e >= 1000.0 then to_char(round(e/10.0)* 10.0, '9,999' )||' '
when e >= 100.0 then to_char(e, '999' )||' '
when e >= 10.0 then to_char(e, '99.9' )||' '
when e >= 1.0 then to_char(e, '9.99' )||' '
else to_char(e, '0.999')
end;
begin
return lpad(t, 12);
end;
$body$;

create procedure s.insert_table_time(no_of_rows int, t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n_text constant text not null := ltrim(to_char(no_of_rows, '999,999,999,999'));
caption constant text not null := rpad('Insert '||n_text||' rows time:', 35);

t0 double
precision not null := 0.0;
begin
truncate table s.t;

t0 := extract(epoch from clock_timestamp());
with g(v) as (select generate_series(1, no_of_rows))
insert into s.t(k, v) select g.v, (g.v)*2 from g;
t := caption||s.elapsed_time(t0);
end;
$body$;

create procedure s.populate_temp_tables_time(t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := rpad('Populate temp tables time:', 35);
t0 double
precision not null := 0.0;
begin
create table pg_temp.t_count(n int);
create table pg_temp.t(k int, v int, r int);

t0 := extract(epoch from clock_timestamp());
with
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);
t := caption||s.elapsed_time(t0);
end;
$body$;

create procedure s.close_cursor(cur in refcursor)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
begin
close cur;
end;
$body$;

create procedure s.declare_cursor_time(cur_name in text, holdable in boolean, t inout text)
-- set search_path = pg_catalog, pg_temp :: Incomparible with "commit".
language plpgsql
as $body$
declare
hld constant text not null := case holdable
when true then 'with hold'
else 'without hold'
end;
slct constant text not null := 'select k, v from s.t order by k';
decl constant text not null := format('declare %I scroll cursor %s for '||slct, cur_name, hld);
caption constant text not null := rpad('Declare "'||cur_name||'" time:', 35);

t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
execute decl;
if holdable then
commit;
end if;
t := caption||s.elapsed_time(t0);
end;
$body$;

create function s.native_count_time(expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad('Native count(*) time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
n := (select count(*) from s.t);
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;

drop function if exists s.count_by_move_time(refcursor, int, boolean) cascade;
create function s.count_by_move_time(cur in refcursor, expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad(cur::text||' move fwd count time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;

create function s.move_back_and_fore_time(cur in refcursor, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := case show_caption
when true then rpad(cur::text||' move back & fore time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;
return caption||s.elapsed_time(t0);
end;
$body$;
--------------------------------------------------------------------------------

-- Do the timimg.
-- Ten thousand rows.
\set no_of_rows 10000000
\set Is_Holdable ''''Is-Holdable''''
\set Not_Holdable ''''Not-Holdable''''
\t on
\o results.txt

call s.insert_table_time(:no_of_rows, null::text);
call s.populate_temp_tables_time(null::text);

select s.native_count_time(:no_of_rows, true);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);

call s.declare_cursor_time(:Is_Holdable, true, null::text);

select s.count_by_move_time(:Is_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);

start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);

select s.count_by_move_time(:Not_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
rollback;

call s.close_cursor(:Is_Holdable);

call s.declare_cursor_time(:Is_Holdable, true, null::text);

select s.move_back_and_fore_time(:Is_Holdable, true);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);

start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);

select s.move_back_and_fore_time(:Not_Holdable, true);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
rollback;

\o
\t off

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-03-26 21:21:58 Re: Support logical replication of DDLs
Previous Message vignesh C 2023-03-26 14:06:38 Re: Support logical replication of DDLs