| From: | Robert Creager <robertc(at)spectralogic(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Barry Litt <barryl(at)spectralogic(dot)com> |
| Subject: | Re: Need help identifying a periodic performance issue. |
| Date: | 2021-11-24 22:44:12 |
| Message-ID: | BA7F0900-1AD5-4680-A11D-1D47429FCF2F@spectralogic.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
I forgot, I had reloaded postgres, but had not re-started our app, so the connections wouldn’t have that plan setting on them. Re-doing now.
On Nov 24, 2021, at 2:13 PM, Robert Creager <robertc(at)spectralogic(dot)com<mailto:robertc(at)spectralogic(dot)com>> wrote:
On Nov 19, 2021, at 11:47 AM, Robert Creager <robertc(at)spectralogic(dot)com<mailto:robertc(at)spectralogic(dot)com>> wrote:
On Nov 18, 2021, at 2:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
This message originated outside your organization.
Thomas Munro <thomas(dot)munro(at)gmail(dot)com<mailto:thomas(dot)munro(at)gmail(dot)com>> writes:
This is all guesswork though, since we don't know quite what's
happening on Robert's system. It might be worth setting
"log_autovacuum_min_duration = 0" (either globally, or as a
reloption on the relevant tables), and seeing if there seems
to be any correlation between autovacuum/autoanalyze activity
and the occurrences of poor plan choices.
Ok, doing a SET plan_cache_mode=force_custom_plan before the COPY and resetting it after appears to fix the problem. We’re going to run it over the weekend to make sure.
We are at it again. I have a DELETE operation that’s taking 48 minutes so far. I had set plan_cache_mode = force_custom_plan for the entire server before this happened, as we started seeing the COPY slowdown again. I have dtrace information again, but primarily shows the nested scan operation.
pid,client_port,runtime,query_start,datname,state,wait_event_type,query,usename
40665,15978,0 years 0 mons 0 days 0 hours 48 mins 49.62347 secs,2021-11-24 20:13:30.017188 +00:00,tapesystem,active,,DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1)),Administrator
So how do we avoid this query plan? Do we need to start doing explicit analyzes after every delete?
EXPLAIN DELETE
FROM ds3.blob
WHERE EXISTS(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1'));
250k objects in blob
256k objects in s3_object
QUERY PLAN
Delete on blob (cost=10117.05..16883.09 rows=256002 width=12)
-> Hash Join (cost=10117.05..16883.09 rows=256002 width=12)
Hash Cond: (blob.object_id = s3_object.id<http://s3_object.id/>)
-> Seq Scan on blob (cost=0.00..6094.02 rows=256002 width=22)
-> Hash (cost=6917.02..6917.02 rows=256002 width=22)
-> Seq Scan on s3_object (cost=0.00..6917.02 rows=256002 width=22)
Filter: (bucket_id = '8a988c6c-ef98-465e-a148-50054c739212'::uuid)
’Normal’ explain, very few objects with that bucket.
QUERY PLAN
Delete on blob (cost=0.71..6.76 rows=1 width=12)
-> Nested Loop (cost=0.71..6.76 rows=1 width=12)
-> Index Scan using ds3_s3_object__bucket_id on s3_object (cost=0.29..2.31 rows=1 width=22)
Index Cond: (bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1'::uuid)
-> Index Scan using ds3_blob__object_id on blob (cost=0.42..4.44 rows=1 width=22)
Index Cond: (object_id = s3_object.id<http://s3_object.id/>)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Justin Pryzby | 2021-11-24 23:15:42 | Re: Need help identifying a periodic performance issue. |
| Previous Message | Robert Creager | 2021-11-24 21:13:27 | Re: Need help identifying a periodic performance issue. |