Re: Automatic aggressive vacuum on almost frozen table takes too long

From: Mikhail Balayan <mv(dot)balayan(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Automatic aggressive vacuum on almost frozen table takes too long
Date: 2023-02-17 01:40:22
Message-ID: CAC2oM1aTNHFK1r5sGGOjkU7y3xRutk30H+om1h2hcd77RAMt+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> >> Even still,
> >> the information about buffers that you've shown does indeed appear to
> >> be total nonsense (while everything else we can see looks plausible).

Actually not only buffers. The elapsed time also looks strange. And this is
the first reason why I paid attention to this situation. 33029 seconds,
which is 9+ hours to scan 4898 blocks and related indexes. And by this
metric, it seems that the amount of buffers processed is reliable (which
does not negate the presence of the bug). Otherwise I can not explain such
a long vacuuming of the table.

>> Do you have any non-btree indexes on the table? Can you show us the
> details of the
> >> table, including all of its indexes? In other words, can you show "\d
> applications" output from psql?

Only btree indexes. Please find the full table schema below:
appdbname=> \d applications
Table "appschemaname.applications"
Column | Type | Collation |
Nullable | Default
--------------------------------+------------------------+-----------+----------+----------
id | character varying(36) | | not
null |
created_at | bigint | | not
null |
updated_at | bigint | | not
null |
deleted_at | bigint | |
|
tenant_id | character varying(36) | |
|
context_tenant_id | character varying(36) | |
|
policy_id | character varying(36) | |
|
policy_type | character varying(128) | |
|
policy_name | text | |
|
root_policy_id | character varying(36) | |
|
context_id | character varying(36) | |
|
context_type | character varying(128) | |
|
agent_id | character varying(36) | |
|
multiple_applications_conflict | text | |
|
validation_issues | text | |
|
deployment_state | character varying(128) | |
|
deployment_errors | text | |
|
enabled | boolean | |
|
has_conflict_alert | boolean | |
|
running_state | character varying(128) | |
|
running_activities | text | |
|
running_progress | character varying(128) | |
|
running_blockers | text | |
|
last_success_time | bigint | |
|
last_run_time | bigint | |
|
last_activity | character varying(128) | |
|
next_run_time | bigint | |
|
next_activity | character varying(128) | |
|
most_severe_status_event | text | |
|
status | character varying(128) | |
|
origin_context | text | | not
null | ''::text
euc | character varying(36) | |
|
event_flag | integer | |
| 1
Indexes:
"applications2_pkey" PRIMARY KEY, btree (id)
"chk_invariant_1_apps2" UNIQUE, btree (policy_type, context_id) WHERE
multiple_applications_conflict <> ''::text AND enabled = true AND
root_policy_id::text <> ''::text AND deleted_at IS NULL
"uq_policy_id_context2" UNIQUE, btree (policy_id, context_id) WHERE
deleted_at IS NULL
"idx_applications2_agent_id" btree (agent_id)
"idx_applications2_context_id" btree (context_id)
"idx_applications2_context_tenant_id" btree (context_tenant_id)
"idx_applications2_context_type" btree (context_type)
"idx_applications2_deleted_at" btree (deleted_at)
"idx_applications2_enabled" btree (enabled)
"idx_applications2_euc_index" btree (euc)
"idx_applications2_policy_id" btree (policy_id)
"idx_applications2_policy_type" btree (policy_type)
"idx_applications2_root_policy_id" btree (root_policy_id) WHERE
deleted_at IS NULL
"idx_applications2_status" btree (status)
"idx_applications2_status_by_policy_fields_context_tenant_i" btree
(context_tenant_id, root_policy_id, policy_type, context_type, deleted_at)
WHERE deleted_at IS NULL AND root_policy_id::text = ''::text
"idx_applications2_status_by_policy_fields_tenant_id" btree (tenant_id,
root_policy_id, policy_type, context_type, deleted_at) WHERE deleted_at IS
NULL AND root_policy_id::text = ''::text
"idx_applications2_tenant_id" btree (tenant_id)
"idx_applications_created_at" btree (created_at)
"tmp_idx_applications2_updated_at2" btree (updated_at)

Thank you!

Mikhael

On Fri, 17 Feb 2023 at 03:42, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Feb 16, 2023 at 7:44 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
> > > That is, if I understand it correctly, it says that there were (and
> > > actually are) 2013128 pages of which 2008230 were skipped, which leaves
> > > 4898 blocks to be scanned. I.e. it seems that the allocated 1GB
> > > (autovacuum_work_mem) should be enough to handle that amount of blocks
> > > and to avoid multiple scans of the indexes.
>
> It's quite clear that there is only one pass over the indexes, since
> "index scans: 1" says exactly that.
>
> > > But, based on buffer usage, one can see that a huge amount of data is
> > > read, greatly exceeding not only the number of remaining unfrozen
> > > blocks, but also the size of the table and indexes taken together: 2
> > > billion blocks, more than 15TB.
> > >
> > > Is this a bug in Postgresql or am I interpreting the log data wrong?
>
> I think that it might be a bug in Postgres.
>
> I addressed a similar issue in the same "buffer" instrumentation in
> commit d3609dd2, but that wasn't backpatched because I imagined that
> it only applied to the new VACUUM VERBOSE case (VACUUM VERBOSE only
> started sharing the same instrumentation code as log_autovacuum in
> Postgres 15). It's not immediately obvious how you could see a problem
> like the one you've shown in the autovacuum log output. Even still,
> the information about buffers that you've shown does indeed appear to
> be total nonsense (while everything else we can see looks plausible).
> There has to be some explanation for that.
>
> The only other explanation I can think of is a pathological case where
> an index scan by some particular ambulkdelete routine scans a number
> of buffers that vastly exceeds the total size of the index. That does
> seem just about possible with an access method like GIN. Do you have
> any non-btree indexes on the table? Can you show us the details of the
> table, including all of its indexes? In other words, can you show "\d
> applications" output from psql?
>
> --
> Peter Geoghegan
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Zheng Li 2023-02-17 04:13:09 Re: Support logical replication of DDLs
Previous Message Mikhail Balayan 2023-02-17 01:23:03 Re: Automatic aggressive vacuum on almost frozen table takes too long