Quick Links

Re: Memory-Bounded Hash Aggregation

From:	Adam Lee <ali(at)pivotal(dot)io>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Memory-Bounded Hash Aggregation
Date:	2020-01-08 07:12:02
Message-ID:	20200108071202.GA1511@mars.local
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Jeff

I tried to use the logical tape APIs for hash agg spilling, based on
your 1220 version.

Turns out it doesn't make much of performance difference with the
default 8K block size (might be my patch's problem), but the disk space
(not I/O) would be saved a lot because I force the respilling to use the
same LogicalTapeSet.

Logtape APIs with default block size 8K:
```
postgres=# EXPLAIN ANALYZE SELECT avg(g) FROM generate_series(0,5000000) g GROUP BY g;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=75000.02..75002.52 rows=200 width=36) (actual time=7701.706..24473.002 rows=5000001 loops=1)
Group Key: g
Memory Usage: 4096kB Batches: 516 Disk: 116921kB
-> Function Scan on generate_series g (cost=0.00..50000.01 rows=5000001 width=4) (actual time=1611.829..3253.150 rows=5000001 loops=1)
Planning Time: 0.194 ms
Execution Time: 25129.239 ms
(6 rows)
```

Bare BufFile APIs:
```
postgres=# EXPLAIN ANALYZE SELECT avg(g) FROM generate_series(0,5000000) g GROUP BY g;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=75000.02..75002.52 rows=200 width=36) (actual time=7339.835..24472.466 rows=5000001 loops=1)
Group Key: g
Memory Usage: 4096kB Batches: 516 Disk: 232773kB
-> Function Scan on generate_series g (cost=0.00..50000.01 rows=5000001 width=4) (actual time=1580.057..3128.749 rows=5000001 loops=1)
Planning Time: 0.769 ms
Execution Time: 26696.502 ms
(6 rows)
```

Even though, I'm not sure which API is better, because we should avoid
the respilling as much as we could in the planner, and hash join uses
the bare BufFile.

Attached my hacky and probably not robust diff for your reference.

--
Adam Lee

Attachment	Content-Type	Size
hashagg_logtape.diff	text/plain	14.8 KB

In response to

Re: Memory-Bounded Hash Aggregation at 2019-12-22 23:05:46 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Richard Guo	2020-01-08 07:24:21	Re: Parallel grouping sets
Previous Message	曾文旌 (义从)	2020-01-08 07:03:09	Re: [Proposal] Global temporary tables