Re: Degraded performance during table rewrite

From: Mohamed Wael Khobalatte <mkhobalatte(at)grubhub(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Degraded performance during table rewrite
Date: 2020-07-04 19:34:19
Message-ID: CABZeWdwxXdV6s0CbM=SsREc4cHt8v8ZPRbYy9fM9k1fwh281Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jul 3, 2020 at 11:30 PM Mohamed Wael Khobalatte <
mkhobalatte(at)grubhub(dot)com> wrote:

> Another idea that might conceivably be useful to you on 9.6 is to
>
> reorganize the bulk deletions so that most of them aren't at the endpoint
>> of the live id range. If you made it so that the extremal values are
>> deleted last, you'd never hit this behavior.
>>
>> regards, tom lane
>>
>
> Hm, I am not sure I understood your suggestion. We'd still need to find a
> range of ids to delete, and the order by asc has the nice property that it
> moves old records first, which helps tables with a lot of activity on
> recently created tuples (and I suppose an ordering in the other direction
> would suffer from the same problem).
>

Tom, I think I understood what you meant after getting some sleep. What I
do now is:

- Initially load a lot of ids, say a million.
- Then I slice that array by my preferred batch, say a 10000.
- Loop through these slices and run an improved query, which looks like
this:

WITH del AS (
DELETE FROM #{old_table}
WHERE id >= #{first_id_in_slice} AND id <= #{last_id_in_slice}
RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id

This not only stays *steady*, but dropping the inner query reduces it to
sub 100ms per batch! The initial query to find the million Ids does drop a
little as we go along (autovacuum will help that one), but it is a far cry
from the earlier situation. I also think this keeps the desired correctness
behavior as well. Is this what you had in mind?

Here is sample output I logged of the migration:

Migrated 10000 records from towns_old to towns in 82.92 ms.
Migrated 10000 records from towns_old to towns in 84.89 ms.
Migrated 10000 records from towns_old to towns in 85.92 ms.
Migrated 10000 records from towns_old to towns in 86.57 ms.

Here is a log of the initial query to load ID ranges:

Loaded towns_old ids in 3569.71 ms. Current id range is 63299754 to
64299753.
Loaded towns_old ids in 1990.62 ms. Current id range is 64299754 to
65299753.
Loaded towns_old ids in 2542.46 ms. Current id range is 65299754 to
66299753.
Loaded towns_old ids in 2040.88 ms. Current id range is 66299754 to
67299753.
Loaded towns_old ids in 1907.96 ms. Current id range is 67299754 to
68299753.
Loaded towns_old ids in 2626.74 ms. Current id range is 68299754 to
70435753.
Loaded towns_old ids in 3510.16 ms. Current id range is 70435754 to
71435753.
Loaded towns_old ids in 1841.95 ms. Current id range is 71435754 to
72435753.
Loaded towns_old ids in 1774.52 ms. Current id range is 72435754 to
73435753.

These numbers of loading ranges seem acceptable to me, and were not changed
by autovacuum considering the number of tuples actually requested. If we
spend an average of 2s here, it's still remarkably better than before
because we've saved an average of 500ms per batch, and at 100 batches per
id range, that's 50 seconds saved!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-07-04 19:58:56 Re: Degraded performance during table rewrite
Previous Message Mohamed Wael Khobalatte 2020-07-04 03:30:37 Re: Degraded performance during table rewrite