Re: Automatic aggressive vacuum on almost frozen table takes too long

From: Mikhail Balayan <mv(dot)balayan(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Automatic aggressive vacuum on almost frozen table takes too long
Date: 2023-02-21 05:43:00
Message-ID: CAC2oM1Ze2zQDCdgc_oM_SbnSHkmnsfxCPBHkdH15cLDQ=cxgtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Probably the result of vacuum freeze manual execution will give some more
information:
Table size:
appdbname3=# \dt+ appschemaname.applications
List of relations
Schema | Name | Type | Owner | Size |
Description
---------------+--------------+-------+---------------+----------+-------------
appschemaname | applications | table | appschemaname | 10012 MB |

Each index size:
tablename | indexname
| index_size | unique |
--------------+------------------------------------------------------------+------------+--------+
applications | chk_invariant_1_apps2
| 8192 bytes | Y |
applications | applications2_pkey
| 1645 MB | Y |
applications | idx_applications2_agent_id
| 2982 MB | N |
applications | idx_applications2_context_id
| 2708 MB | N |
applications | idx_applications2_context_tenant_id
| 1560 MB | N |
applications | idx_applications2_context_type
| 4963 MB | N |
applications | idx_applications2_deleted_at
| 3264 MB | N |
applications | idx_applications2_enabled
| 4975 MB | N |
applications | idx_applications2_euc_index
| 1684 MB | N |
applications | idx_applications2_policy_id
| 3131 MB | N |
applications | idx_applications2_policy_type
| 16 GB | N |
applications | idx_applications2_root_policy_id
| 1574 MB | N |
applications | idx_applications2_status
| 5391 MB | N |
applications | idx_applications2_status_by_policy_fields_context_tenant_i
| 13 MB | N |
applications | idx_applications2_status_by_policy_fields_tenant_id
| 15 MB | N |
applications | idx_applications2_tenant_id
| 1670 MB | N |
applications | idx_applications_created_at
| 2855 MB | N |
applications | tmp_idx_applications2_updated_at2
| 3505 MB | N |
applications | uq_policy_id_context2
| 1398 MB | Y |

And vacuum freeze output:
appdbname3=# VACUUM (FREEZE, VERBOSE) appschemaname.applications;
INFO: aggressively vacuuming "appschemaname.applications"
INFO: scanned index "applications2_pkey" to remove 10488 row versions
DETAIL: CPU: user: 2.45 s, system: 1.07 s, elapsed: 5.16 s
INFO: scanned index "tmp_idx_applications2_updated_at2" to remove 10488
row versions
DETAIL: CPU: user: 2.53 s, system: 2.86 s, elapsed: 11.56 s
INFO: scanned index "idx_applications2_agent_id" to remove 10488 row
versions
DETAIL: CPU: user: 3.16 s, system: 2.36 s, elapsed: 11.04 s
INFO: scanned index "idx_applications2_context_id" to remove 10488 row
versions
DETAIL: CPU: user: 2.22 s, system: 1.95 s, elapsed: 8.06 s
INFO: scanned index "idx_applications2_context_tenant_id" to remove 10488
row versions
DETAIL: CPU: user: 1.86 s, system: 1.16 s, elapsed: 5.10 s
INFO: scanned index "idx_applications2_context_type" to remove 10488 row
versions
DETAIL: CPU: user: 190.09 s, system: 6.48 s, elapsed: 220.88 s
INFO: scanned index "idx_applications2_deleted_at" to remove 10488 row
versions
DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s
INFO: scanned index "idx_applications2_enabled" to remove 10488 row
versions
DETAIL: CPU: user: 78.22 s, system: 4.74 s, elapsed: 96.96 s
INFO: scanned index "idx_applications2_policy_id" to remove 10488 row
versions
DETAIL: CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s
INFO: scanned index "idx_applications2_policy_type" to remove 10488 row
versions
DETAIL: CPU: user: 300.45 s, system: 16.05 s, elapsed: 365.80 s
INFO: scanned index "idx_applications2_root_policy_id" to remove 10488 row
versions
DETAIL: CPU: user: 2.13 s, system: 1.20 s, elapsed: 6.16 s
INFO: scanned index "idx_applications2_status" to remove 10488 row versions
DETAIL: CPU: user: 153.79 s, system: 6.57 s, elapsed: 185.76 s
INFO: scanned index
"idx_applications2_status_by_policy_fields_context_tenant_i" to remove
10488 row versions
DETAIL: CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.06 s
INFO: scanned index "idx_applications2_status_by_policy_fields_tenant_id"
to remove 10488 row versions
DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.05 s
INFO: scanned index "idx_applications2_tenant_id" to remove 10488 row
versions
DETAIL: CPU: user: 2.11 s, system: 1.23 s, elapsed: 6.19 s
INFO: scanned index "idx_applications2_euc_index" to remove 10488 row
versions
DETAIL: CPU: user: 2.05 s, system: 1.27 s, elapsed: 6.05 s
INFO: scanned index "chk_invariant_1_apps2" to remove 10488 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: scanned index "uq_policy_id_context2" to remove 10488 row versions
DETAIL: CPU: user: 1.24 s, system: 1.00 s, elapsed: 4.40 s
INFO: scanned index "idx_applications_created_at" to remove 10488 row
versions
DETAIL: CPU: user: 2.32 s, system: 1.97 s, elapsed: 8.46 s
INFO: "applications": removed 10488 row versions in 7580 pages
DETAIL: CPU: user: 0.86 s, system: 0.08 s, elapsed: 1.03 s
INFO: index "applications2_pkey" now contains 17803283 row versions in
210561 pages
DETAIL: 0 index row versions were removed.
59 index pages have been deleted, 59 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "tmp_idx_applications2_updated_at2" now contains 17803283 row
versions in 448580 pages
DETAIL: 0 index row versions were removed.
19542 index pages have been deleted, 19542 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_agent_id" now contains 17803283 row
versions in 381741 pages
DETAIL: 0 index row versions were removed.
61237 index pages have been deleted, 28239 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_id" now contains 17803283 row
versions in 346594 pages
DETAIL: 0 index row versions were removed.
57308 index pages have been deleted, 24105 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_tenant_id" now contains 17803283
row versions in 199743 pages
DETAIL: 0 index row versions were removed.
32830 index pages have been deleted, 14767 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_type" now contains 17803283 row
versions in 635317 pages
DETAIL: 0 index row versions were removed.
43331 index pages have been deleted, 41233 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_deleted_at" now contains 17803283 row
versions in 417770 pages
DETAIL: 0 index row versions were removed.
15695 index pages have been deleted, 15381 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_enabled" now contains 17803283 row versions
in 636750 pages
DETAIL: 0 index row versions were removed.
26332 index pages have been deleted, 26037 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_policy_id" now contains 17803283 row
versions in 400718 pages
DETAIL: 0 index row versions were removed.
55007 index pages have been deleted, 29671 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_policy_type" now contains 17803283 row
versions in 2084261 pages
DETAIL: 7990 index row versions were removed.
56012 index pages have been deleted, 55365 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_root_policy_id" now contains 5015730 row
versions in 201441 pages
DETAIL: 5807 index row versions were removed.
22964 index pages have been deleted, 17915 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status" now contains 17803283 row versions
in 690063 pages
DETAIL: 10488 index row versions were removed.
31401 index pages have been deleted, 31113 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status_by_policy_fields_context_tenant_i"
now contains 155059 row versions in 1707 pages
DETAIL: 223 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status_by_policy_fields_tenant_id" now
contains 155059 row versions in 1883 pages
DETAIL: 223 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_tenant_id" now contains 17803283 row
versions in 213798 pages
DETAIL: 10488 index row versions were removed.
33323 index pages have been deleted, 16730 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_euc_index" now contains 17803283 row
versions in 215570 pages
DETAIL: 10488 index row versions were removed.
32261 index pages have been deleted, 16643 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "chk_invariant_1_apps2" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "uq_policy_id_context2" now contains 5015730 row versions in
178953 pages
DETAIL: 5807 index row versions were removed.
23793 index pages have been deleted, 13898 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications_created_at" now contains 17803283 row
versions in 365479 pages
DETAIL: 10488 index row versions were removed.
19020 index pages have been deleted, 18938 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "applications": found 0 removable, 471321 nonremovable row versions
in 30772 out of 1278613 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4177172255
There were 1490846 unused item pointers.
Skipped 0 pages due to buffer pins, 1247841 frozen pages.
0 pages are entirely empty.
CPU: user: 816.65 s, system: 55.37 s, elapsed: 1031.96 s.
INFO: aggressively vacuuming "pg_toast.pg_toast_35599271"
INFO: "pg_toast_35599271": found 0 removable, 2 nonremovable row versions
in 1 out of 2328 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4178183844
There were 2 unused item pointers.
Skipped 0 pages due to buffer pins, 2327 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

What catches my eye: scanning indexes smaller than 3.1GB is fast, larger
ones are slow. For example:
idx_applications2_policy_id is 3131 MB took just 5 seconds (DETAIL: CPU:
user: 2.99 s, system: 1.65 s, elapsed: 5.32 s)
but idx_applications2_deleted_at with 3264 MB took 1 minute 22 seconds
(DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s)

In addition to this, I collected statistics during the
idx_applications2_enabled scan:
# strace -c -p 26792
strace: Process 26792 attached
^Cstrace: Process 26792 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.87 2.931694 4 711882 read
4.61 0.148573 3 55598 19503 futex
3.01 0.097138 1 70933 lseek
0.87 0.027926 7 4261 write
0.44 0.014348 23 623 fdatasync
0.20 0.006372 5 1245 kill
0.00 0.000081 5 17 open
0.00 0.000042 4 12 select
0.00 0.000038 3 12 close
0.00 0.000000 0 1 rt_sigreturn
0.00 0.000000 0 1 sendto
0.00 0.000000 0 2 getrusage
------ ----------- ----------- --------- --------- ----------------

--
Mikhael

On Mon, 20 Feb 2023 at 13:29, Mikhail Balayan <mv(dot)balayan(at)gmail(dot)com> wrote:

> >> Can you run amcheck's bt_index_check() routine against some of the
>> >> indexes you've shown? There is perhaps some chance that index
>> >> corruption exists and causes VACUUM to take a very long time to delete
>> >> index pages. This is pretty much a wild guess, though.
>
>
> Unfortunately I can't, we haven't enabled this extension yet. And since
> this is a production, I'm not ready to turn it on right away. But I can say
> that this theory is unlikely, since this problem occurs on different
> sites. Here's an example of an output from another site where table size is
> 46.5 GB (again, data only), indexes 107GB, toast 62MB:
>
> automatic aggressive vacuum of table
> "appdbname2.appschemaname.applications": index scans: 1
> pages: 0 removed, 6091646 remain, 0 skipped due to pins, 6086395 skipped
> frozen
> tuples: 2344 removed, 35295654 remain, 0 are dead but not yet removable,
> oldest xmin: 213412878
> buffer usage: 251980554 hits, 14462331 misses, 18844 dirtied
> avg read rate: 12.018 MB/s, avg write rate: 0.016 MB/s
> system usage: CPU: user: 7734.43 s, system: 178.98 s, elapsed: 9401.36 s
>
>
> Here again we see that there are 5251 blocks that need to be cleaned
> (6091646 - 6086395), buffer usage is 266461729 blocks or ~ 2 TB and
> processing time is 2.5h+.
>
> >> It's possible that VACUUM had to wait a long time for a cleanup lock
>> on one individual heap page here
>
> If such a scenario is possible, it makes sense to add information about
> the blocking waiting time to the output. Something like:
> system usage: CPU: user: 7734.43 s, system: 178.98 s, lock_wait:
> 1234.56 s, elapsed: 9401.36 s
>
>
> Mikhail
>
> On Sat, 18 Feb 2023 at 05:35, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>> On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv(dot)balayan(at)gmail(dot)com>
>> wrote:
>> >> >> Do you have any non-btree indexes on the table? Can you show us the
>> details of the
>> >> >> table, including all of its indexes? In other words, can you show
>> "\d applications" output from psql?
>> >
>> > Only btree indexes. Please find the full table schema below:
>>
>> It's possible that VACUUM had to wait a long time for a cleanup lock
>> on one individual heap page here, which could have added a long delay.
>> But...that doesn't seem particularly likely.
>>
>> Can you run amcheck's bt_index_check() routine against some of the
>> indexes you've shown? There is perhaps some chance that index
>> corruption exists and causes VACUUM to take a very long time to delete
>> index pages. This is pretty much a wild guess, though.
>>
>> --
>> Peter Geoghegan
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Geoghegan 2023-02-21 05:50:56 Re: Automatic aggressive vacuum on almost frozen table takes too long
Previous Message Adrian Klaver 2023-02-21 05:29:25 Re: pg_dump'ed file contains "DROP DATABASE"