Hash aggregate spilling (v13) / partitions & batches

From: talk to ben <blo(dot)talkto(at)gmail(dot)com>
To: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Hash aggregate spilling (v13) / partitions & batches
Date: 2020-11-23 16:36:19
Message-ID: CAPE8EZ7npc55G4PnH2sW75+a0wfjd8e6GT6z8o95gqUioh0ZXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I am testing things on hash aggregate spilling in version 13.1 and am
struggling to understand the partition thing in the two explains below.
My understanding is that a partition corresponds to a spill file which will
be treated in a later batch (which can re-spill in some cases).

Am I right to think that the second explain analyze says that PostgreSQL
was planning for 8 batches (there are 8 planned partitions) and that only
one was necessary (= no spill files) ?

regards
benoit

[local]:5433 postgres(at)postgres=# CREATE TABLE tableA(ac1 int, ac2 int);
CREATE TABLE
[local]:5433 postgres(at)postgres=# CREATE TABLE tableB(bc1 int, bc2 int);
CREATE TABLE

[local]:5433 postgres(at)postgres=# INSERT INTO tableA SELECT x, random()*100
FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000
[local]:5433 postgres(at)postgres=# INSERT INTO tableB SELECT mod(x,100000),
random()*100 FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000

[local]:5433 postgres(at)postgres=# SELECT name, setting, unit FROM
pg_settings WHERE name IN('work_mem', 'hash_mem_multiplier');
name | setting | unit
---------------------+---------+------
hash_mem_multiplier | 1 | NULL
work_mem | 4096 | kB
(2 rows)
[local]:5433 postgres(at)postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=137290.50..157056.12 rows=1000000 width=20) (actual
time=773.405..889.020 rows=99999 loops=1)
Group Key: tablea.ac1
Planned Partitions: 32 Batches: 33 Memory Usage: 4369kB Disk Usage:
30456kB
-> Hash Join (cost=30832.00..70728.00 rows=1000000 width=12) (actual
time=158.774..583.031 rows=999990 loops=1)
Hash Cond: (tableb.bc1 = tablea.ac1)
-> Seq Scan on tableb (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.023..77.297 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual
time=158.378..158.379 rows=1000000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3471kB
-> Seq Scan on tablea (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.010..53.476 rows=1000000 loops=1)
Planning Time: 0.824 ms
Execution Time: 895.251 ms
(11 rows)

[local]:5433 postgres(at)postgres=# SET hash_mem_multiplier TO 5;
SET
[local]:5433 postgres(at)postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=137290.50..157056.12 rows=1000000 width=20) (actual
time=696.684..714.198 rows=99999 loops=1)
Group Key: tablea.ac1
Planned Partitions: 8 Batches: 1 Memory Usage: 15633kB
-> Hash Join (cost=30832.00..70728.00 rows=1000000 width=12) (actual
time=171.789..560.692 rows=999990 loops=1)
Hash Cond: (tableb.bc1 = tablea.ac1)
-> Seq Scan on tableb (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.032..78.718 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual
time=168.592..168.593 rows=1000000 loops=1)
Buckets: 524288 Batches: 4 Memory Usage: 13854kB
-> Seq Scan on tablea (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.018..52.796 rows=1000000 loops=1)
Planning Time: 0.242 ms
Execution Time: 717.914 ms
(11 rows)

Browse pgsql-general by date

  From Date Subject
Next Message Jagmohan Kaintura 2020-11-23 17:03:20 Re: ERROR : invalid transaction termination : PostgreSQL v12
Previous Message Adrian Klaver 2020-11-23 16:11:27 Re: Multiple result set not displayed in PgAdmin4