From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: A potential memory leak on Merge Join when Sort node is not below Materialize node |
Date: | 2022-09-29 01:12:44 |
Message-ID: | CAApHDvpxQsTX0rYceTQYeSqUZzpyFQkFVKOV=UUB68=NLcJAFg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 29 Sept 2022 at 12:07, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Also potentially relevant: the 2017 commit fa117ee4 anticipated adding
> a "copy" argument to tuplesort_getdatum() (the same commit added such
> a "copy" argument to tuplesort_gettupleslot()). I see that that still
> hasn't happened to tuplesort_getdatum() all these years later. Might
> be a good idea to do it in the next year or two, though.
>
> If David is interested in pursuing this now then I certainly won't object.
Just while this is fresh in my head, I wrote some code to make this
happen. My preference would be not to add the "copy" param to the
existing function and instead just add a new function to prevent
additional branching.
The attached puts back the datum sort in nodeSort.c for byref types
and adjusts process_ordered_aggregate_single() to make use of this
function.
I did a quick benchmark to see if this help DISTINCT aggregate any:
create table t1 (a varchar(32) not null, b varchar(32) not null);
insert into t1 select md5((x%10)::text),md5((x%10)::text) from
generate_Series(1,1000000)x;
vacuum freeze t1;
create index on t1(a);
With a work_mem of 256MBs I get:
query = select max(distinct a), max(distinct b) from t1;
Master:
latency average = 313.197 ms
Patched:
latency average = 304.335 ms
So not a very impressive speedup there (about 3%)
Some excerpts from perf top show:
Master:
1.40% postgres [.] palloc
1.13% postgres [.] tuplesort_getdatum
0.77% postgres [.] datumCopy
Patched:
0.91% postgres [.] tuplesort_getdatum_nocopy
0.65% postgres [.] palloc
I stared for a while at the mode_final() function and thought maybe we
could use the nocopy variant there. I just didn't quite pluck up the
motivation to write any code to see if it could be made faster.
David
Attachment | Content-Type | Size |
---|---|---|
add_tuplesort_getdatum_nocopy_function.patch | text/plain | 5.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2022-09-29 01:31:42 | Re: A potential memory leak on Merge Join when Sort node is not below Materialize node |
Previous Message | Zhang Mingli | 2022-09-29 01:05:18 | Re: Refactor UnpinBuffer() |