From: | "Yorwerth, Adam" <Adam(dot)Yorwerth(at)tesco(dot)com> |
---|---|
To: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Cc: | "Cianciaruso, Lorenzo" <lorenzo(dot)cianciaruso(at)tesco(dot)com> |
Subject: | Query performance with min and filter |
Date: | 2020-08-31 09:27:13 |
Message-ID: | 0548FF3D-05E1-4C9F-91F5-B2EDA69A589D@tesco.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Postgres community,
We seem to have found a situation where a query run using explain analyse or pgbench is incredibly fast, but run via Java under load performs very poorly – we’ve checked query performance metrics for our Postgres instance and can confirm that it’s the query running slowly, and not a Java problem.
The table has ~7 million rows, with total size of 44GB. As shown below, explain analyse gives a sub 1ms execution time, yet in practice the query takes an average of 2 seconds – it’s worth mentioning that a small percentage of requests do take ~1ms, so something non-deterministic is happening here.
The query in question is below, with explain analyse:
explain analyse SELECT coalesce (
(SELECT min(msg_offset) - 1 FROM events WHERE created_utc >= CURRENT_TIMESTAMP - INTERVAL '60 SECONDS' ),
(SELECT max(msg_offset) FROM events),
0
);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Result (cost=2.03..2.04 rows=1 width=8) (actual time=0.183..0.183 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=1.56..1.57 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1)
-> Index Only Scan using test_4 on events (cost=0.44..1.56 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (created_utc >= (now() - '00:01:00'::interval))
Heap Fetches: 0
InitPlan 3 (returns $3)
-> Result (cost=0.45..0.46 rows=1 width=8) (actual time=0.153..0.153 rows=1 loops=1)
InitPlan 2 (returns $2)
-> Limit (cost=0.43..0.45 rows=1 width=8) (actual time=0.152..0.152 rows=1 loops=1)
-> Index Only Scan Backward using events_pkey on events events_1 (cost=0.43..156842.57 rows=7643099 width=8) (actual time=0.151..0.151 rows=1 loops=1)
Index Cond: (msg_offset IS NOT NULL)
Heap Fetches: 0
Planning time: 2.480 ms
Execution time: 0.242 ms
(15 rows)
Output of \d events, with a few test indexes added to try and improve performance:
Table "public.events"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+--------------------------------------------
msg_offset | bigint | | not null | nextval('events_msg_offset_seq'::regclass)
msg_key | character varying | | not null |
content_type | character varying | | not null |
type | character varying | | not null |
created_utc | timestamp without time zone | | not null |
data | text | | |
event_size | integer | | not null | 1048576
cluster_id | bigint | | not null | 1
Indexes:
"events_pkey" PRIMARY KEY, btree (msg_offset)
"cluster_idx" btree (msg_key, cluster_id)
"cluster_type_filter_idx" btree (type, cluster_id)
"created_idx" btree (created_utc, msg_key)
"key_idx" btree (msg_key)
"test_1" btree (created_utc, msg_offset)
"test_2" btree (msg_offset, created_utc)
"test_3" btree (created_utc, msg_offset)
"test_4" btree (created_utc, msg_offset DESC)
"type_idx" btree (type)
Any help understanding what is happening here, or guidance to improve performance would be much appreciated.
Regards,
Adam
This is a confidential email. Tesco may monitor and record all emails. The views expressed in this email are those of the sender and not Tesco. Tesco Stores Limited Company Number: 519500 Registered in England Registered Office: Tesco House, Shire Park, Kestrel Way, Welwyn Garden City, AL7 1GA VAT Registration Number: GB 220 4302 31
From | Date | Subject | |
---|---|---|---|
Next Message | Thorsten Schöning | 2020-08-31 10:37:00 | Re: How bad is using queries with thousands of values for operators IN or ANY? |
Previous Message | Thorsten Schöning | 2020-08-31 09:16:22 | Re: How bad is using queries with thousands of values for operators IN or ANY? |