From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | alexander(dot)berezin3000(at)gmail(dot)com |
Subject: | BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n |
Date: | 2024-05-23 17:34:12 |
Message-ID: | 18477-55fd294f03d66632@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 18477
Logged by: Alexander777
Email address: alexander(dot)berezin3000(at)gmail(dot)com
PostgreSQL version: 14.6
Operating system: CentOS 7.6.1810
Description:
Summary:
A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if
an ordering column is not nullable.
Description:
Recently, I encountered an intriguing performance issue with a SQL query
over a table with 30+ millions of rows. This query takes significantly
longer to execute than I anticipated. After a thorough investigation, I
extracted a minimal example that clearly illustrates the problem,
particularly when the ordering column is not NULL-able.
Steps to Reproduce:
1. Database Setup:
- PostgreSQL version: 14.6
- Client: psql 14.6
- Operating system: CentOS 7
- Architecture: x86-64
2. Create simple table and index
CREATE TABLE T1 (
pk BIGSERIAL NOT NULL PRIMARY KEY,
updated_at BIGINT NOT NULL DEFAULT 0
);
CREATE INDEX idx_t1_updated_at_pk ON T1 (updated_at, pk);
3. Add big number of rows, in my environment there are ~35M rows
SELECT COUNT(*) FROM T1;
count
----------
35493666
(1 row)
4. Problematic Query:
SELECT updated_at FROM T1
ORDER BY updated_at NULLS FIRST LIMIT 130000
5. Actual Result:
The query takes tens of seconds to execute, which is significantly slower
than expected.
5.1 EXPLAIN (ANALYZE,BUFFERS) output from problematic query:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1774902.21..1790069.93 rows=130000 width=8) (actual
time=12107.287..12225.239 rows=130000 loops=1)
Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949
written=231607
I/O Timings: read=6287.278
-> Gather Merge (cost=1774902.21..5093645.38 rows=28444384 width=8)
(actual time=12107.285..12216.140 rows=130000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
I/O Timings: read=6287.278
-> Sort (cost=1773902.18..1809457.66 rows=14222192 width=8)
(actual time=12043.738..12048.354 rows=44698 loops=3)
Sort Key: updated_at NULLS FIRST
Sort Method: external merge Disk: 204184kB
Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
I/O Timings: read=6287.278
Worker 0: Sort Method: external merge Disk: 210344kB
Worker 1: Sort Method: external merge Disk: 210576kB
-> Parallel Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..494747.42 rows=14222192 width=8) (actual time=0.088..7815.929
rows=11827789 loops=3)
Heap Fetches: 858732
Buffers: shared hit=893979 read=162956 dirtied=3807
I/O Timings: read=6287.278
Planning Time: 0.099 ms
Execution Time: 12278.073 ms
(21 rows)
5.2 Same query but w/o NULLS FIRST
SELECT updated_at FROM T1
ORDER BY updated_at LIMIT 130000
5.2.1 EXPLAIN (ANALYZE,BUFFERS) output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=0.063..30.874
rows=130000 loops=1)
Buffers: shared hit=12 read=501
I/O Timings: read=9.383
-> Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=0.062..21.435
rows=130000 loops=1)
Heap Fetches: 0
Buffers: shared hit=12 read=501
I/O Timings: read=9.383
Planning Time: 0.083 ms
Execution Time: 35.562 ms
(9 rows)
5.3 Same query but with NULLS LAST
SELECT updated_at FROM T1
ORDER BY updated_at NULLS LAST LIMIT 130000
5.3.1 EXPLAIN (ANALYZE,BUFFERS) output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=1.768..35.431
rows=130000 loops=1)
Buffers: shared hit=11 read=502
I/O Timings: read=13.143
-> Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=1.767..26.107
rows=130000 loops=1)
Heap Fetches: 0
Buffers: shared hit=11 read=502
I/O Timings: read=13.143
Planning Time: 0.135 ms
Execution Time: 40.427 ms
(9 rows)
6. Expected Result:
The query in 5. should have performance like in 5.2 or 5.3 for non-nullable
columns.
7. Workaround:
Workaround is to not specify NULLS FIRST explicitly for non-nullable
columns.
8. Severity:
Medium - This issue causes significant performance issues when querying on
big tables (like ERROR: temporary file size exceeds temp_file_limit
(5242880kB)).
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-05-23 18:26:10 | Re: BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n |
Previous Message | Tom Lane | 2024-05-23 15:51:22 | Re: BUG #18475: pg_dump: "Error Segmentation fault" |