Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: dgrowleyml(at)gmail(dot)com
Cc: ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN
Date: 2024-09-06 04:21:31
Message-ID: 20240906.132131.1196176059464159407.ishii@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

>> I pushed the changes to WindowAgg so as not to call tuplestore_end()
>> on every partition. Can you rebase this patch over that change?
>>
>> It would be good to do this in a way that does not add any new state
>> to WindowAggState, you can see that I had to shuffle fields around in
>> that struct because the next_parition field would have caused the
>> struct to become larger. I've not looked closely, but I expect this
>> can be done by adding more code to tuplestore_updatemax() to also
>> track the disk space used if the current storage has gone to disk. I
>> expect the maxSpace field can be used for both, but we'd need another
>> bool field to track if the max used was by disk or memory.

I have created a patch in the direction you suggested. See attached
patch (v1-0001-Enhance-tuplestore.txt). To not confuse CFbot, the
extension is "txt", not "patch".

>> I think the performance of this would also need to be tested as it
>> means doing an lseek() on every tuplestore_clear() when we've gone to
>> disk. Probably that will be dominated by all the other overheads of a
>> tuplestore going to disk (i.e. dumptuples() etc), but it would be good
>> to check this. I suggest setting work_mem = 64 and making a test case
>> that only just spills to disk. Maybe do a few thousand partitions
>> worth of that and see if you can measure any slowdown.

I copied your shell script and slightly modified it then ran pgbench
with (1 10 100 1000 5000 10000) window partitions (see attached shell
script). In the script I set work_mem to 64kB. It seems for 10000,
1000 and 100 partitions, the performance difference seems
noises. However, for 10, 2, 1 partitions. I see large performance
degradation with the patched version: patched is slower than stock
master in 1.5% (10 partitions), 41% (2 partitions) and 55.7% (1
partition). See the attached graph.

Attachment Content-Type Size
unknown_filename text/plain 3.2 KB
unknown_filename text/plain 720 bytes
image/png 20.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-09-06 04:28:52 Re: Trying out read streams in pgvector (an extension)
Previous Message Amit Langote 2024-09-06 03:07:49 Re: pgsql: Add more SQL/JSON constructor functions