BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: alexander(dot)berezin3000(at)gmail(dot)com
Subject: BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n
Date: 2024-05-23 17:34:12
Message-ID: 18477-55fd294f03d66632@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18477
Logged by: Alexander777
Email address: alexander(dot)berezin3000(at)gmail(dot)com
PostgreSQL version: 14.6
Operating system: CentOS 7.6.1810
Description:

Summary:
A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if
an ordering column is not nullable.

Description:
Recently, I encountered an intriguing performance issue with a SQL query
over a table with 30+ millions of rows. This query takes significantly
longer to execute than I anticipated. After a thorough investigation, I
extracted a minimal example that clearly illustrates the problem,
particularly when the ordering column is not NULL-able.

Steps to Reproduce:
1. Database Setup:
- PostgreSQL version: 14.6
- Client: psql 14.6
- Operating system: CentOS 7
- Architecture: x86-64

2. Create simple table and index

CREATE TABLE T1 (
pk BIGSERIAL NOT NULL PRIMARY KEY,
updated_at BIGINT NOT NULL DEFAULT 0
);

CREATE INDEX idx_t1_updated_at_pk ON T1 (updated_at, pk);

3. Add big number of rows, in my environment there are ~35M rows
SELECT COUNT(*) FROM T1;
count
----------
35493666
(1 row)

4. Problematic Query:

SELECT updated_at FROM T1
ORDER BY updated_at NULLS FIRST LIMIT 130000

5. Actual Result:
The query takes tens of seconds to execute, which is significantly slower
than expected.

5.1 EXPLAIN (ANALYZE,BUFFERS) output from problematic query:


QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1774902.21..1790069.93 rows=130000 width=8) (actual
time=12107.287..12225.239 rows=130000 loops=1)
Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949
written=231607
I/O Timings: read=6287.278
-> Gather Merge (cost=1774902.21..5093645.38 rows=28444384 width=8)
(actual time=12107.285..12216.140 rows=130000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
I/O Timings: read=6287.278
-> Sort (cost=1773902.18..1809457.66 rows=14222192 width=8)
(actual time=12043.738..12048.354 rows=44698 loops=3)
Sort Key: updated_at NULLS FIRST
Sort Method: external merge Disk: 204184kB
Buffers: shared hit=894055 read=162956 dirtied=3807, temp
read=153949 written=231607
I/O Timings: read=6287.278
Worker 0: Sort Method: external merge Disk: 210344kB
Worker 1: Sort Method: external merge Disk: 210576kB
-> Parallel Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..494747.42 rows=14222192 width=8) (actual time=0.088..7815.929
rows=11827789 loops=3)
Heap Fetches: 858732
Buffers: shared hit=893979 read=162956 dirtied=3807
I/O Timings: read=6287.278
Planning Time: 0.099 ms
Execution Time: 12278.073 ms
(21 rows)

5.2 Same query but w/o NULLS FIRST

SELECT updated_at FROM T1
ORDER BY updated_at LIMIT 130000

5.2.1 EXPLAIN (ANALYZE,BUFFERS) output:


QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=0.063..30.874
rows=130000 loops=1)
Buffers: shared hit=12 read=501
I/O Timings: read=9.383
-> Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=0.062..21.435
rows=130000 loops=1)
Heap Fetches: 0
Buffers: shared hit=12 read=501
I/O Timings: read=9.383
Planning Time: 0.083 ms
Execution Time: 35.562 ms
(9 rows)

5.3 Same query but with NULLS LAST

SELECT updated_at FROM T1
ORDER BY updated_at NULLS LAST LIMIT 130000

5.3.1 EXPLAIN (ANALYZE,BUFFERS) output:


QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=1.768..35.431
rows=130000 loops=1)
Buffers: shared hit=11 read=502
I/O Timings: read=13.143
-> Index Only Scan using idx_t1_updated_at_pk on t1
(cost=0.56..693858.10 rows=34133260 width=8) (actual time=1.767..26.107
rows=130000 loops=1)
Heap Fetches: 0
Buffers: shared hit=11 read=502
I/O Timings: read=13.143
Planning Time: 0.135 ms
Execution Time: 40.427 ms
(9 rows)

6. Expected Result:
The query in 5. should have performance like in 5.2 or 5.3 for non-nullable
columns.

7. Workaround:
Workaround is to not specify NULLS FIRST explicitly for non-nullable
columns.

8. Severity:
Medium - This issue causes significant performance issues when querying on
big tables (like ERROR: temporary file size exceeds temp_file_limit
(5242880kB)).

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-05-23 18:26:10 Re: BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n
Previous Message Tom Lane 2024-05-23 15:51:22 Re: BUG #18475: pg_dump: "Error Segmentation fault"