Degraded performance during table rewrite

From: Mohamed Wael Khobalatte <mkhobalatte(at)grubhub(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Degraded performance during table rewrite
Date: 2020-07-03 20:24:10
Message-ID: CABZeWdxmAafKEr7tugufWYUgAS3AnVRAwCRA8GOnRh-aTzeyCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I am attempting to do a bunch of table rewrites to shrink a table in the
absence of pg_repack and vacuum full (both not an option). The database is
growing fast and has had significant bloat in both heaps and indexes, so
index rebuilds alone won't cut it. We found that table inheritance can be
used to achieve this rather nicely. We are running PG v9.6.18.

We are setting up the inheritance as follows:

BEGIN;
ALTER TABLE #{table} RENAME TO #{table}_old;
CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
ALTER TABLE #{table}_old INHERIT #{table};
ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
COMMIT;

Then, the migration itself runs as follows (each in a transaction, looping
through records and sleeping for a bit)

WITH del AS (
DELETE FROM #{old_table}
WHERE id IN (
SELECT id
FROM #{old_table}
WHERE id > #{max_deleted_id} -- This is the max deleted from the
previous batch, we grab it programmatically.
ORDER BY id ASC
LIMIT #{batch_size}
)
RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id

For instance, the batch_size can be 10_000, and the code sleeps
programatically for 200ms (this is done in a Ruby script).

"max_deleted_id" is passed to each run from the previous one. This improves
the inner SELECT query.

This works very well. However, I noticed two suprising things:

1. The performance of the delete and insert drops by several orders of
magnitude as the script runs. For instance, in one run, it goes from 150ms
average run to 700ms per batch.

2. The explain itself takes a while to run on a sample batch. In one table,
the explain alone took four seconds.

To try and reproduce this locally, I used the following dummy table:

create table towns (id serial primary key, code text, article text, name
text, department text);

insert into towns (
code, article, name, department
)
select
left(md5(i::text), 10),
md5(random()::text),
md5(random()::text),
left(md5(random()::text), 4)

from generate_series(1, 100000000) s(i);

This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
the old table lowers is back to 150ms, but I don't understand why, because
we structure the query to jump over all previously dead rows. There is an
old thread in which Tom Lane mentions that the planner might itself be
walking that primary index. Is this applicable here? And is there anything
we can do besides more aggressive and continued vacuuming of the old table
(or a change in autovacuum settings)? Ideally, we want to run this
overnight without much supervision.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mohamed Wael Khobalatte 2020-07-03 20:46:55 Re: Degraded performance during table rewrite
Previous Message Julien Rouhaud 2020-07-03 18:41:04 Re: survey: psql syntax errors abort my transactions