Query performance with min and filter

From: "Yorwerth, Adam" <Adam(dot)Yorwerth(at)tesco(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Cc: "Cianciaruso, Lorenzo" <lorenzo(dot)cianciaruso(at)tesco(dot)com>
Subject: Query performance with min and filter
Date: 2020-08-31 09:27:13
Message-ID: 0548FF3D-05E1-4C9F-91F5-B2EDA69A589D@tesco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Postgres community,

We seem to have found a situation where a query run using explain analyse or pgbench is incredibly fast, but run via Java under load performs very poorly – we’ve checked query performance metrics for our Postgres instance and can confirm that it’s the query running slowly, and not a Java problem.

The table has ~7 million rows, with total size of 44GB. As shown below, explain analyse gives a sub 1ms execution time, yet in practice the query takes an average of 2 seconds – it’s worth mentioning that a small percentage of requests do take ~1ms, so something non-deterministic is happening here.

The query in question is below, with explain analyse:

explain analyse SELECT coalesce (

(SELECT min(msg_offset) - 1 FROM events WHERE created_utc >= CURRENT_TIMESTAMP - INTERVAL '60 SECONDS' ),

(SELECT max(msg_offset) FROM events),

0

);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-

Result (cost=2.03..2.04 rows=1 width=8) (actual time=0.183..0.183 rows=1 loops=1)

InitPlan 1 (returns $1)

-> Aggregate (cost=1.56..1.57 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1)

-> Index Only Scan using test_4 on events (cost=0.44..1.56 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1)

Index Cond: (created_utc >= (now() - '00:01:00'::interval))

Heap Fetches: 0

InitPlan 3 (returns $3)

-> Result (cost=0.45..0.46 rows=1 width=8) (actual time=0.153..0.153 rows=1 loops=1)

InitPlan 2 (returns $2)

-> Limit (cost=0.43..0.45 rows=1 width=8) (actual time=0.152..0.152 rows=1 loops=1)

-> Index Only Scan Backward using events_pkey on events events_1 (cost=0.43..156842.57 rows=7643099 width=8) (actual time=0.151..0.151 rows=1 loops=1)

Index Cond: (msg_offset IS NOT NULL)

Heap Fetches: 0

Planning time: 2.480 ms

Execution time: 0.242 ms

(15 rows)

Output of \d events, with a few test indexes added to try and improve performance:

Table "public.events"

Column | Type | Collation | Nullable | Default

--------------+-----------------------------+-----------+----------+--------------------------------------------

msg_offset | bigint | | not null | nextval('events_msg_offset_seq'::regclass)

msg_key | character varying | | not null |

content_type | character varying | | not null |

type | character varying | | not null |

created_utc | timestamp without time zone | | not null |

data | text | | |

event_size | integer | | not null | 1048576

cluster_id | bigint | | not null | 1

Indexes:

"events_pkey" PRIMARY KEY, btree (msg_offset)

"cluster_idx" btree (msg_key, cluster_id)

"cluster_type_filter_idx" btree (type, cluster_id)

"created_idx" btree (created_utc, msg_key)

"key_idx" btree (msg_key)

"test_1" btree (created_utc, msg_offset)

"test_2" btree (msg_offset, created_utc)

"test_3" btree (created_utc, msg_offset)

"test_4" btree (created_utc, msg_offset DESC)

"type_idx" btree (type)

Any help understanding what is happening here, or guidance to improve performance would be much appreciated.

Regards,

Adam

This is a confidential email. Tesco may monitor and record all emails. The views expressed in this email are those of the sender and not Tesco. Tesco Stores Limited Company Number: 519500 Registered in England Registered Office: Tesco House, Shire Park, Kestrel Way, Welwyn Garden City, AL7 1GA VAT Registration Number: GB 220 4302 31

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thorsten Schöning 2020-08-31 10:37:00 Re: How bad is using queries with thousands of values for operators IN or ANY?
Previous Message Thorsten Schöning 2020-08-31 09:16:22 Re: How bad is using queries with thousands of values for operators IN or ANY?