| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> | 
|---|---|
| To: | dgrowleyml(at)gmail(dot)com | 
| Cc: | ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN | 
| Date: | 2024-09-06 04:21:31 | 
| Message-ID: | 20240906.132131.1196176059464159407.ishii@postgresql.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi David,
>> I pushed the changes to WindowAgg so as not to call tuplestore_end()
>> on every partition.  Can you rebase this patch over that change?
>> 
>> It would be good to do this in a way that does not add any new state
>> to WindowAggState, you can see that I had to shuffle fields around in
>> that struct because the next_parition field would have caused the
>> struct to become larger. I've not looked closely, but I expect this
>> can be done by adding more code to tuplestore_updatemax() to also
>> track the disk space used if the current storage has gone to disk. I
>> expect the maxSpace field can be used for both, but we'd need another
>> bool field to track if the max used was by disk or memory.
I have created a patch in the direction you suggested.  See attached
patch (v1-0001-Enhance-tuplestore.txt). To not confuse CFbot, the
extension is "txt", not "patch".
>> I think the performance of this would also need to be tested as it
>> means doing an lseek() on every tuplestore_clear() when we've gone to
>> disk. Probably that will be dominated by all the other overheads of a
>> tuplestore going to disk (i.e. dumptuples() etc), but it would be good
>> to check this. I suggest setting work_mem = 64 and making a test case
>> that only just spills to disk. Maybe do a few thousand partitions
>> worth of that and see if you can measure any slowdown.
I copied your shell script and slightly modified it then ran pgbench
with (1 10 100 1000 5000 10000) window partitions (see attached shell
script). In the script I set work_mem to 64kB. It seems for 10000,
1000 and 100 partitions, the performance difference seems
noises. However, for 10, 2, 1 partitions. I see large performance
degradation with the patched version: patched is slower than stock
master in 1.5% (10 partitions), 41% (2 partitions) and 55.7% (1
partition). See the attached graph.
| Attachment | Content-Type | Size | 
|---|---|---|
| unknown_filename | text/plain | 3.2 KB | 
| unknown_filename | text/plain | 720 bytes | 
            
              
                 
              
            
           | 
          image/png | 20.0 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2024-09-06 04:28:52 | Re: Trying out read streams in pgvector (an extension) | 
| Previous Message | Amit Langote | 2024-09-06 03:07:49 | Re: pgsql: Add more SQL/JSON constructor functions |