Extremely inefficient merge-join

From: Marcin Gozdalik <gozdal(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Extremely inefficient merge-join
Date: 2021-03-17 16:10:14
Message-ID: CADu1mRO28n4FifmrJYQ6TfFFdWS7tvwksE=7FVmuqc7BsL2OeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

I am having a rare issue with extremely inefficient merge join. The query
plan indicates that PG is doing some kind of nested loop, although an index
is present.

PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit

Schema of dir_current (some columns left out for brevity):

starfish=# \d sf.dir_current
Table "sf.dir_current"
Column | Type | Collation | Nullable |
Default
--------------------+-------------+-----------+----------+-----------------------------------------------
id | bigint | | not null |
nextval('sf.object_id_seq'::regclass)
volume_id | bigint | | not null |
parent_id | bigint | | |
blocks | sf.blkcnt_t | | |
rec_aggrs | jsonb | | not null |
Indexes:
"dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current"
"dir_current_parentid_idx" btree (parent_id), tablespace "sf_current"
"dir_current_volume_id_out_of_sync_time_idx" btree (volume_id,
out_of_sync_time) WHERE out_of_sync_time IS NOT NULL, tablespace
"sf_current"
"dir_current_volume_id_path_unq_idx" UNIQUE, btree (volume_id, path
text_pattern_ops), tablespace "sf_current"
"dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id),
tablespace "sf_current"
Foreign-key constraints:
"dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES
sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED

dir_process is created as a temporary table:

CREATE TEMP TABLE dir_process AS (
SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks,
atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current
WHERE ....
);

CREATE INDEX dir_process_indx ON dir_process(volume_id, id);
ANALYZE dir_process;

and usually contains a few thousands rows.

Slow query:
SELECT dir.id, dir.volume_id, dir.parent_id, dir.rec_aggrs, dir.blocks
FROM sf.dir_current AS dir
INNER JOIN dir_process ON dir.parent_id = dir_process.id AND
dir.volume_id = dir_process.volume_id
WHERE dir.volume_id = ANY(volume_ids)

dir_current contains around 750M rows altogether, and there is ca. 1.75M
rows with volume_id=5.
Sometimes Postgres will choose very inefficient plan, which involves
looping many times over same rows, producing hundreds of millions or
billions of rows:

LOG: duration: 1125530.496 ms plan:
Merge Join (cost=909.42..40484.01 rows=1 width=456) (actual rows=1
loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 13583132483
-> Index Scan using dir_current_volumeid_id_unq on dir_current
dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=2000756
loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
-> Sort (cost=909.31..912.70 rows=6789 width=16) (actual
rows=13581131729 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 511kB
-> Seq Scan on dir_process (cost=0.00..822.89 rows=6789
width=16) (actual rows=6789 loops=1)

LOG: duration: 3923310.224 ms plan:
Merge Join (cost=0.17..4324.64 rows=1 width=456) (actual rows=529
loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 831113021
-> Index Only Scan using dir_process_indx on dir_process
(cost=0.05..245.00 rows=450 width=16) (actual rows=450 loops=1)
Heap Fetches: 450
-> Index Scan using dir_current_volumeid_id_unq on dir_current
dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=831113101
loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))

LOG: duration: 10140968.829 ms plan:
Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819
loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 2153506735
-> Index Only Scan using dir_process_indx on dir_process
(cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1)
Heap Fetches: 1166
-> Index Scan using dir_current_volumeid_id_unq on dir_current
dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389
loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))

Note that 2153506389 / 1166 = 1 846 918. Similarly 831113101 / 450 =
1 846 918.

I wonder how I can help Postgres query planner to choose a faster plan?

--
Marcin Gozdalik

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2021-03-17 20:47:35 Re: Extremely inefficient merge-join
Previous Message Laurenz Albe 2021-03-15 17:03:59 Re: wide table, many many partitions, poor query performance