BUG #16824: Planner chooses poor path on query with Merge Join and pagination

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: jacobmitash(at)gmail(dot)com
Subject: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
Date: 2021-01-13 22:13:55
Message-ID: 16824-cac06100763db5ae@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16824
Logged by: Jacob Mitash
Email address: jacobmitash(at)gmail(dot)com
PostgreSQL version: 13.1
Operating system: Alpine Linux (Docker container)
Description:

Hi,

I have been working with this relatively simple query that is used to
iterate over a large subset of rows across two tables, one page at a time.
The query was originally using an OFFSET/LIMIT to execute the paging, which
you might guess performs poorly in the later pages. I changed the query to
do pagination using a combination of ORDER BY id, WHERE id >
id_from_prev_page, and LIMIT page_size. I'm not sure if that strategy has a
name, but I'll refer to it as "ID pagination".

My dataset involves just two tables, described by this DDL:
```
-- DDL:
create table sub ( -- 4.3M records
subscription_id varchar(255) -- UUIDs
not null constraint pk_sub primary key,
subscription_status varchar(255) -- (ACTIVE, PENDING, or CANCELLED)
);

create table sub_item ( -- 4.4M records, mostly 1:1 with subscriptions
subscription_item_id varchar(255) -- UUIDs
not null constraint pk_sub_item primary key,
subscription_id varchar(255) -- FK to sub
constraint fk_sub_item_sub_id references sub,
item_ref varchar(255) -- 25 character strings, mostly the
same
);
```

And here's the query I've been using to select pages while testing
performance. The literal value of the subscription_id is to represent
grabbing a page in the middle of the data.
```
-- Query: Grab a page from the middle
EXPLAIN ANALYZE
SELECT s.subscription_id, *
FROM sub s
JOIN sub_item si ON s.subscription_id = si.subscription_id AND
si.item_ref = '1001107599999910222001000'
WHERE s.subscription_status = 'ACTIVE'
AND s.SUBSCRIPTION_ID > '7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'
ORDER BY s.subscription_id
LIMIT 50;
```

My understanding of this ID pagination is that it should be very quick as it
needs only find the ID in the index, and then scan the next 50 entries.
However, the execution time is slow:
```
Limit (cost=2.41..325.76 rows=50 width=179) (actual
time=10336.095..10340.389 rows=50 loops=1)
-> Merge Join (cost=2.41..765832.96 rows=118421 width=179) (actual
time=10336.086..10339.900 rows=50 loops=1)
Merge Cond: ((s.subscription_id)::text =
(si.subscription_id)::text)
-> Index Scan using pk_sub on sub s (cost=0.56..270648.21
rows=632913 width=45) (actual time=0.025..1.643 rows=90 loops=1)
Index Cond: ((subscription_id)::text >
'7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text)
Filter: ((subscription_status)::text = 'ACTIVE'::text)
Rows Removed by Filter: 211
-> Index Scan using idx_sub_item_sub_id_btree on sub_item si
(cost=0.56..490385.99 rows=813151 width=98) (actual time=0.016..8260.586
rows=393058 loops=1)
Filter: ((item_ref)::text =
'1001107599999910222001000'::text)
Rows Removed by Filter: 1742475
Planning Time: 0.309 ms
Execution Time: 10340.691 ms
```

Just fiddling around, I saw that performance significantly improved when I
ran: SET enable_mergejoin = OFF;
```
Limit (cost=1.11..446.97 rows=50 width=179) (actual time=0.150..4.351
rows=50 loops=1)
-> Nested Loop (cost=1.11..1055964.59 rows=118421 width=179) (actual
time=0.140..3.850 rows=50 loops=1)
-> Index Scan using pk_sub on sub s (cost=0.56..270648.21
rows=632913 width=45) (actual time=0.064..0.898 rows=90 loops=1)
Index Cond: ((subscription_id)::text >
'7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text)
Filter: ((subscription_status)::text = 'ACTIVE'::text)
Rows Removed by Filter: 211
-> Index Scan using idx_sub_item_sub_id_btree on sub_item si
(cost=0.56..1.23 rows=1 width=98) (actual time=0.013..0.016 rows=1
loops=90)
Index Cond: ((subscription_id)::text =
(s.subscription_id)::text)
Filter: ((item_ref)::text =
'1001107599999910222001000'::text)
Rows Removed by Filter: 0
Planning Time: 0.196 ms
Execution Time: 4.623 ms
```

I originally asked about this on the DBA StackExchange. I had two questions
to which I'll summarize here:

1. Why is the Merge Join performing so slowly?
It seems to be because the planner doesn't recognize that it can apply the
subscription_id index condition on the inner table. If I explicitly tell it:
"AND si.subscription_id > '7ca1...'", then it applies an index condition and
is almost instant.
2. Why does the planner believe that Merge Join (as-is) is optimal here?
I don't have an answer for this. Potentially a bug related to the previous
answer?

So I think there's two things going on here that may or may not be
classified as bugs:
1. Using Merge Join, the planner does not apply an index condition to the
inner table that could be implied from the clause on the outer table.
2. In the same scenario, the planner's cost estimation is grossly different
from the actual execution time.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-01-14 00:16:39 Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
Previous Message Tom Lane 2021-01-13 16:42:06 Re: BUG #16823: Unreachable code