Re: Slow WAL recovery for DROP TABLE

From: Sherrylyn Branchaw <sbranchaw(at)gmail(dot)com>
To: k(dot)jamison(at)jp(dot)fujitsu(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow WAL recovery for DROP TABLE
Date: 2018-07-18 17:27:48
Message-ID: CAB_myF7G3V8CoJGUzGxun4XjWeSTUxONi2iy4e0H9K7qvfrzZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Hi, I have also reported a similar problem in the hackers mailing list,
but particularly on TRUNCATE TABLE.
https://www.postgresql.org/message-id/flat/D09B13F772D2274BB348A310EE3027C62FD6E6%40g01jpexmbkw24

Ooh, interesting. I admit I did not include TRUNCATE in my testing.

> The problem lies with the standby server’s replay as it does separate
scanning of the whole shared buffer for each DROP/TRUNCATE TABLE in order
to check if the table-to-delete is cached in shared buffer. Therefore, it
will take a long recovery time and sometimes fail for large tables
depending on shared_buffer size.

Also very interesting. We only (?) have 8 GB of shared buffers, and I see
from your message that you had 300 GB. All of our tables, both in prod and
in my reproduction, were empty, but there were hundreds of thousands of
them.

> The main problem here is the scanning of shared_buffers, which not only
affects drop/truncate table, but also drop database and vacuum as well.

I wondered about that. I didn't have any problem with a single drop
database, but the database dropped was a small one (albeit one with a few
hundred thousand empty tables), and I neither tested dropping a large
database nor dropping 100,000 databases. I didn't test vacuuming, but we do
heavy vacuuming on all our primaries frequently, and...hmm. Regular
vacuuming doesn't cause any problems that have made it onto my radar, but
VACUUM FULLs can cause WAL files to pile up on the primary's pg_xlog before
getting archived. I never investigated that, just throttled my VACUUM
FULLs, because they're only ever run manually. I will keep an eye on the
recovery time of individual files the next time I have to do this, which
will probably be soon.

> But I think any working minor solutions/fixes from developers are also
welcome, such as the recent committed patch for the multiple dropped tables
per transaction with large shared_buffers.

Agreed. Should I have sent or should I still send this to pgsql-hackers? I
wasn't sure, so I erred on the side of not bothering the developers until
I'd gotten some feedback here.

Best,
Sherrylyn

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tomas Vondra 2018-07-18 20:08:09 Re: Shared buffers increased but cache hit ratio is still 85%
Previous Message Sherrylyn Branchaw 2018-07-18 17:11:56 Re: Slow WAL recovery for DROP TABLE