Re: Degraded performance during table rewrite

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Degraded performance during table rewrite
Date: 2020-07-03 21:33:37
Message-ID: 7147e502-5925-5e13-096e-757b5764f000@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 7/3/20 3:24 PM, Mohamed Wael Khobalatte wrote:
> Hi all,
>
> I am attempting to do a bunch of table rewrites to shrink a table in the
> absence of pg_repack and vacuum full (both not an option). The database is
> growing fast and has had significant bloat in both heaps and indexes, so
> index rebuilds alone won't cut it. We found that table inheritance can be
> used to achieve this rather nicely. We are running PG v9.6.18.
>
> We are setting up the inheritance as follows:
>
> BEGIN;
> ALTER TABLE #{table} RENAME TO #{table}_old;
> CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
> ALTER TABLE #{table}_old INHERIT #{table};
> ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
> COMMIT;
>
> Then, the migration itself runs as follows (each in a transaction, looping
> through records and sleeping for a bit)
>
> WITH del AS (
>   DELETE FROM #{old_table}
>   WHERE id IN (
>     SELECT id
>     FROM #{old_table}
>     WHERE id > #{max_deleted_id} -- This is the max deleted from the
> previous batch, we grab it programmatically.
>     ORDER BY id ASC
>     LIMIT #{batch_size}
>   )
>   RETURNING *
> )
> INSERT INTO #{table}
> SELECT * FROM del
> RETURNING id
>
> For instance, the batch_size can be 10_000, and the code sleeps
> programatically for 200ms (this is done in a Ruby script).
>
> "max_deleted_id" is passed to each run from the previous one. This
> improves the inner SELECT query.
>
> This works very well. However, I noticed two suprising things:
>
> 1. The performance of the delete and insert drops by several orders of
> magnitude as the script runs. For instance, in one run, it goes from 150ms
> average run to 700ms per batch.
>
> 2. The explain itself takes a while to run on a sample batch. In one
> table, the explain alone took four seconds.
>
> To try and reproduce this locally, I used the following dummy table:
>
> create table towns (id serial primary key, code text, article text, name
> text, department text);
>
> insert into towns (
>     code, article, name, department
> )
> select
>     left(md5(i::text), 10),
>     md5(random()::text),
>     md5(random()::text),
>     left(md5(random()::text), 4)
>
> from generate_series(1, 100000000) s(i);
>
> This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
> the old table lowers is back to 150ms, but I don't understand why, because
> we structure the query to jump over all previously dead rows. There is an
> old thread in which Tom Lane mentions that the planner might itself be
> walking that primary index. Is this applicable here? And is there anything
> we can do besides more aggressive and continued vacuuming of the old table
> (or a change in autovacuum settings)? Ideally, we want to run this
> overnight without much supervision.

Is the problem really about writing the new table, or reading from the old
table?

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Mohamed Wael Khobalatte 2020-07-04 02:16:04 Re: Degraded performance during table rewrite
Previous Message Tom Lane 2020-07-03 21:26:04 Re: Degraded performance during table rewrite